Reorganized the vault folder structure. Every search index immediately became useless — they were all pointing at folders that no longer existed.
The Problem
The vault search system uses collections, each mapped to a folder path. I’d restructured the vault’s top-level organization: renamed folders, merged some, split others. The search tool didn’t know any of this. Every query hit stale paths and returned nothing.
The search infrastructure was intact — the indexes, the embeddings, the query engine all worked fine. They just had nowhere to look.
The Rebuild
Removed the 6 old collections and created 9 new ones, each mapped 1:1 to the current vault structure. Then indexed the development directory as a separate collection.
| Collection Type | Count |
|---|---|
| Old collections removed | 6 |
| New collections created | 9 |
| Dev folder files indexed | 3,857 markdown files across 116+ repos |
| Total chunks generated | ~21K+ |
The development directory was the largest single collection — READMEs, documentation, inline comments, changelogs, all searchable from one place.
Vector Search Didn’t Work
Pulled embeddinggemma to the local model runner for vector embeddings. That’s where the plan started to break down.
Vector search worked for smaller collections but hung on the larger indexes — the vector table hit 12K+ entries and the lookup step stalled. Full-text search (BM25) worked fine as a fallback and was fast enough for most queries.
The Automation
The vault changes daily. Manual reindexing doesn’t happen. A stale index is worse than no index — it gives confident wrong answers.
Created an update script and a launchd agent to run it every 4 hours:
| Component | Purpose |
|---|---|
Update script (~/bin/) | Reindexes all collections, regenerates embeddings, logs results |
| Launchd agent | Fires every 4 hours, macOS-native scheduling |
Why launchd over cron
macOS. Launchd is the native scheduler, handles power management correctly (won’t fire while sleeping, catches up on wake), and doesn’t require the cron permission dance that macOS has been tightening for years.
Decisions
| Decision | Rationale |
|---|---|
| 1:1 collection-to-folder mapping | Keeps the mental model simple — the collection name matches the folder name |
| Skip binary asset folders | Media files aren’t useful for text search |
| 4-hour refresh interval | Frequent enough for freshness, not wasteful |
| BM25 as primary search | Vector search hangs on large indexes; full-text is fast and reliable |
The Damage Report
| Metric | Value |
|---|---|
| Session duration | ~90 minutes |
| Old collections removed | 6 |
| New collections created | 9 |
| Files indexed | 3,857+ markdown files |
| Chunks generated | ~21K+ |
| Automation | 4-hour launchd refresh |
| Known issue | Vector search hangs at 12K+ vectors; BM25 fallback works |
The rebuild took ~90 minutes. The automation runs every four hours. The indexes stay current without intervention.