Published a Tool. Its README Fingerprinted Me.

Published an open source tool. Included specific stats in the README to establish credibility — how many components the system has, what scale it runs at, how it’s structured. Standard social proof for a tool’s README.

Another user’s AI assistant read those stats and surfaced my setup details as comparison context.

What Happened

The README contained ecosystem statistics: exact counts of skills, agents, and other components. The numbers were there to say “this tool is battle-tested in a real production environment.” What they actually said was “here’s a detailed fingerprint of the author’s personal infrastructure.”

An AI assistant doesn’t skim a README the way a human does. It reads every word, correlates it with everything else it knows, and uses it. The stats I put there to prove the tool works became the stats that identified my setup.

What Was Exposed

Detail	Risk
Full name	Identity linkage
Personal site URL	Infrastructure mapping
Project name	System identification
Exact component counts	Setup fingerprinting

None of this was a security breach. It was all information I’d voluntarily published. The problem was aggregation — each detail is harmless alone, but together they create a profile that an AI can correlate and surface.

The Fix

Scrubbed the README. Replaced specific stats with generic credibility language — “a large production setup” conveys scale without fingerprinting. Kept the GitHub handle (that’s the minimum for open source attribution) but removed the full name, personal site, and project name.

Updated the pre-publish scanning tool with new detection categories:

Project name added to the names list
New “ecosystem metadata” category for stats that fingerprint the author
New pattern matching for component counts (N skills, N agents, etc.)
Author sections: flag for review — GitHub handle alone is sufficient

Committed the sanitized README and pushed.

Why LLMs change the calculus

A human reads a component count and thinks “that’s a lot” and moves on. An AI reads the same number and correlates it with every other piece of information it has — author name, site URL, project name, other repos. The information density that’s useful for human credibility becomes a detailed identifier for AI correlation.

The old calculus: “is this information I’d be comfortable with someone reading?” The new calculus: “is this information I’d be comfortable with something reading, correlating, and repeating?”

The Damage Report

Metric	Value
Session duration	~25 minutes
Details scrubbed	Full name, personal site, project name, ecosystem stats
Detection categories added	2 (ecosystem metadata, stat-fingerprint pattern)
Actual data breach	None — all voluntarily published information
Root cause	Aggregation risk from AI correlation, not unauthorized access
Commit	Sanitized README pushed

The information I put there to prove the tool works became the information that identified me. The stats were doing two jobs at once and I only intended one of them.