Published an open source tool. Included specific stats in the README to establish credibility — how many components the system has, what scale it runs at, how it’s structured. Standard social proof for a tool’s README.
Another user’s AI assistant read those stats and surfaced my setup details as comparison context.
What Happened
The README contained ecosystem statistics: exact counts of skills, agents, and other components. The numbers were there to say “this tool is battle-tested in a real production environment.” What they actually said was “here’s a detailed fingerprint of the author’s personal infrastructure.”
An AI assistant doesn’t skim a README the way a human does. It reads every word, correlates it with everything else it knows, and uses it. The stats I put there to prove the tool works became the stats that identified my setup.
What Was Exposed
| Detail | Risk |
|---|---|
| Full name | Identity linkage |
| Personal site URL | Infrastructure mapping |
| Project name | System identification |
| Exact component counts | Setup fingerprinting |
None of this was a security breach. It was all information I’d voluntarily published. The problem was aggregation — each detail is harmless alone, but together they create a profile that an AI can correlate and surface.
The Fix
Scrubbed the README. Replaced specific stats with generic credibility language — “a large production setup” conveys scale without fingerprinting. Kept the GitHub handle (that’s the minimum for open source attribution) but removed the full name, personal site, and project name.
Updated the pre-publish scanning tool with new detection categories:
- Project name added to the names list
- New “ecosystem metadata” category for stats that fingerprint the author
- New pattern matching for component counts (
N skills,N agents, etc.) - Author sections: flag for review — GitHub handle alone is sufficient
Committed the sanitized README and pushed.
Why LLMs change the calculus
A human reads a component count and thinks “that’s a lot” and moves on. An AI reads the same number and correlates it with every other piece of information it has — author name, site URL, project name, other repos. The information density that’s useful for human credibility becomes a detailed identifier for AI correlation.
The old calculus: “is this information I’d be comfortable with someone reading?” The new calculus: “is this information I’d be comfortable with something reading, correlating, and repeating?”
The Damage Report
| Metric | Value |
|---|---|
| Session duration | ~25 minutes |
| Details scrubbed | Full name, personal site, project name, ecosystem stats |
| Detection categories added | 2 (ecosystem metadata, stat-fingerprint pattern) |
| Actual data breach | None — all voluntarily published information |
| Root cause | Aggregation risk from AI correlation, not unauthorized access |
| Commit | Sanitized README pushed |
The information I put there to prove the tool works became the information that identified me. The stats were doing two jobs at once and I only intended one of them.