Skip to content
Second Brain Chronicles
Go back

Vault Reorganization Broke Every Search Index

Vault Reorganization Broke Every Search Index

Reorganized the vault folder structure. Every search index immediately became useless — they were all pointing at folders that no longer existed.

The Problem

The vault search system uses collections, each mapped to a folder path. I’d restructured the vault’s top-level organization: renamed folders, merged some, split others. The search tool didn’t know any of this. Every query hit stale paths and returned nothing.

The search infrastructure was intact — the indexes, the embeddings, the query engine all worked fine. They just had nowhere to look.

The Rebuild

Removed the 6 old collections and created 9 new ones, each mapped 1:1 to the current vault structure. Then indexed the development directory as a separate collection.

Collection TypeCount
Old collections removed6
New collections created9
Dev folder files indexed3,857 markdown files across 116+ repos
Total chunks generated~21K+

The development directory was the largest single collection — READMEs, documentation, inline comments, changelogs, all searchable from one place.

Vector Search Didn’t Work

Pulled embeddinggemma to the local model runner for vector embeddings. That’s where the plan started to break down.

Vector search worked for smaller collections but hung on the larger indexes — the vector table hit 12K+ entries and the lookup step stalled. Full-text search (BM25) worked fine as a fallback and was fast enough for most queries.

The Automation

The vault changes daily. Manual reindexing doesn’t happen. A stale index is worse than no index — it gives confident wrong answers.

Created an update script and a launchd agent to run it every 4 hours:

ComponentPurpose
Update script (~/bin/)Reindexes all collections, regenerates embeddings, logs results
Launchd agentFires every 4 hours, macOS-native scheduling
Why launchd over cron

macOS. Launchd is the native scheduler, handles power management correctly (won’t fire while sleeping, catches up on wake), and doesn’t require the cron permission dance that macOS has been tightening for years.

Decisions

DecisionRationale
1:1 collection-to-folder mappingKeeps the mental model simple — the collection name matches the folder name
Skip binary asset foldersMedia files aren’t useful for text search
4-hour refresh intervalFrequent enough for freshness, not wasteful
BM25 as primary searchVector search hangs on large indexes; full-text is fast and reliable

The Damage Report

MetricValue
Session duration~90 minutes
Old collections removed6
New collections created9
Files indexed3,857+ markdown files
Chunks generated~21K+
Automation4-hour launchd refresh
Known issueVector search hangs at 12K+ vectors; BM25 fallback works

The rebuild took ~90 minutes. The automation runs every four hours. The indexes stay current without intervention.


Share this post on:

Previous Post
Published a Tool. Its README Fingerprinted Me.
Next Post
Tasks Live in Two Places. Neither Knew About the Other.