Skip to content
Second Brain Chronicles
Go back

Twelve Thousand Laws in Fifty Minutes

Twelve Thousand Laws in Fifty Minutes

Government data is public. Getting at it is not.

Spain’s national statistics institute (INE) has an API. The official gazette (BOE) also has an API. Both return XML, both have documentation that assumes you already know the schema, and both require enough setup that most people just Google the number they need and move on.

This week’s experiment: what if an AI assistant could query both directly?


What I Tried

I built two MCP servers in a single session — one wrapping the INE statistics API (70+ datasets: CPI, employment, demographics, housing, tourism), another wrapping the BOE legislation API plus a local corpus of 12,052 laws across 18 jurisdictions.

The corpus came from legalize-es, an existing open-source collection. I pulled it in as a Git submodule rather than forking or copying. The two servers share a base package with caching, retry logic, an XML parser, and Spanish-specific type validation.

The whole thing took about fifty minutes.

Grep Won

The INE server worked on the first try — live API calls returned real CPI data without drama.

The BOE server was trickier. The legislation API returns XML with nested structures that don’t map cleanly to what you’d want an AI to reason about. Parsing decisions ate more time than the API integration itself.

The corpus search was where it got interesting. My first instinct was to reach for vector embeddings — semantic search, the whole setup. I stopped myself. Twelve thousand laws is not that many documents, and a grep-based approach with an in-memory index turned out to need zero infrastructure, build in seconds, and return results fast enough that the latency was indistinguishable from a network call. No vector database, no embedding model, no index to maintain.

One technical snag: the MCP SDK expects Zod v3 types internally. My project used a newer version. A compatibility import fixed it, but it cost ten minutes of confusion before I found the mismatch.

The Deeper Thing

The instinct to over-engineer search is strong. “Twelve thousand documents” sounds like a semantic search problem — it isn’t. The threshold where you actually need vector embeddings is higher than most people assume, and every layer of infrastructure you add is a layer you maintain.

The transferable version: before adding complexity to a search problem, check whether the corpus fits in memory. If it does, start with the dumbest thing that works.


The Numbers

MetricValueWhy It Matters
MCP servers built2INE (statistics) + BOE (legislation)
Total session time~50 minArchitecture through verified live calls
Laws in corpus12,052Across 18 Spanish jurisdictions
INE datasets accessible70+CPI, employment, demographics, housing, tourism
BOE tools108 live API + 2 corpus search
Vector databases needed0Grep won

Next


Share this post on:

Next Post
The Org Chart Has Four Robots