Vault Reorganization Broke Every Search Index

Reorganized the vault folder structure. Every search index immediately became useless — they were all pointing at folders that no longer existed.

The Problem

The vault search system uses collections, each mapped to a folder path. I’d restructured the vault’s top-level organization: renamed folders, merged some, split others. The search tool didn’t know any of this. Every query hit stale paths and returned nothing.

The search infrastructure was intact — the indexes, the embeddings, the query engine all worked fine. They just had nowhere to look.

The Rebuild

Removed the 6 old collections and created 9 new ones, each mapped 1:1 to the current vault structure. Then indexed the development directory as a separate collection.

Collection Type	Count
Old collections removed	6
New collections created	9
Dev folder files indexed	3,857 markdown files across 116+ repos
Total chunks generated	~21K+

The development directory was the largest single collection — READMEs, documentation, inline comments, changelogs, all searchable from one place.

Vector Search Didn’t Work

Pulled embeddinggemma to the local model runner for vector embeddings. That’s where the plan started to break down.

Vector search worked for smaller collections but hung on the larger indexes — the vector table hit 12K+ entries and the lookup step stalled. Full-text search (BM25) worked fine as a fallback and was fast enough for most queries.

The Automation

The vault changes daily. Manual reindexing doesn’t happen. A stale index is worse than no index — it gives confident wrong answers.

Created an update script and a launchd agent to run it every 4 hours:

Component	Purpose
Update script (`~/bin/`)	Reindexes all collections, regenerates embeddings, logs results
Launchd agent	Fires every 4 hours, macOS-native scheduling

Why launchd over cron

macOS. Launchd is the native scheduler, handles power management correctly (won’t fire while sleeping, catches up on wake), and doesn’t require the cron permission dance that macOS has been tightening for years.

Decisions

Decision	Rationale
1:1 collection-to-folder mapping	Keeps the mental model simple — the collection name matches the folder name
Skip binary asset folders	Media files aren’t useful for text search
4-hour refresh interval	Frequent enough for freshness, not wasteful
BM25 as primary search	Vector search hangs on large indexes; full-text is fast and reliable

The Damage Report

Metric	Value
Session duration	~90 minutes
Old collections removed	6
New collections created	9
Files indexed	3,857+ markdown files
Chunks generated	~21K+
Automation	4-hour launchd refresh
Known issue	Vector search hangs at 12K+ vectors; BM25 fallback works

The rebuild took ~90 minutes. The automation runs every four hours. The indexes stay current without intervention.