Skip to main content
Back to blog

The Evidence Lake: Why Coquina Now Has Two Postgres Databases

8 min read

A few months ago I wrote about Coquina as a data lake with a nervous system — curated memories, auto-linked, semantically searchable, cross-tool. That post described what I called an "auto-data lake": a memory store that does work between write and read so AI agents can build shared context.

That post was about the curated layer — the part where opinions get formed.

This post is about what sits underneath it now: a second lake, on a different physical disk, holding every single thing the agents have ever done.

The Question That Started It

I was poking at a Cortex MCP connection issue when I asked Claude a different question entirely: "If you can extract topics from past conversation threads, shouldn't that be part of the auto books we have created?" And then a follow-up: "I'd like to see every bit of data that I can ever see from you, including token count, what you think about, I want to store everything. And by everything I mean every tool call and what it was and what returned etc."

Up to that point, Coquina stored memories — the things I'd explicitly asked Claude to remember. Decisions. Learnings. Facts. Curated stuff. If I made an architectural call at 2 AM and stored a memory about it, Coquina kept that memory. But the raw conversation that produced the decision? Gone the moment the terminal closed. The token usage, the tool calls, the model's reasoning — all sitting in ~/.claude/projects/*.jsonl files on disk, never indexed, never searchable, never connected to anything.

That's a strange state of affairs. The summary is preserved. The receipts are not.

Knowledge ≠ Evidence

The tension I kept hitting is that curated memory and raw telemetry want different homes.

Curated memory is small, hot, opinionated. A few thousand carefully-graded facts and decisions. You want it indexed for fast graph traversal, weighted by recency and access count, and tightly bound to your auto-linking heuristics. It's the opinions of an organization frozen in writing.

Raw telemetry is the opposite. Big. Unstructured-ish. Boring 99% of the time. But the 1% you actually want is buried in tool outputs and assistant reasoning that nobody flagged as worth keeping. A 1,595-turn debugging session is mostly noise. The exact sequence of which Bash commands produced what error in what order — that's not noise. That's evidence. And you can't know which lines you'll need until you need them.

Mixing them in one Postgres table would be a category error. The hot graph would have its autovacuum disrupted by gigabytes of write churn. Any heavy ingest would back-pressure curated writes. And worst, the search ranking would be fighting itself: a curated memory rated by reasoning about why-it-matters would compete in the same scorespace with a tool_result rated by token-frequency. They mean different things.

So I built two lakes.

The Architecture (Boring Version)

The curated database stays where it is: PostgreSQL on the local SSD, owned by the cortex user, schema-frozen by years of opinions about what "memory" should be.

The new database is coquina_lake. Same Postgres instance, separate database, placed on a tablespace rooted at /Volumes/Node02/coquina_lake/pgdata. That puts the bulk of the data on the cold-tier external drive that already gets nightly-mirrored to a second drive — so the lake rides for free on infrastructure I'd already built.

A second connection pool handles writes and reads against the lake. It's gated on a single environment variable, LAKE_DATABASE_URL. If that variable is unset, federation is invisible: the codebase behaves exactly as before. If it's set, queries against search_unified() reach into both databases in parallel and merge the results, ranked together, returned as one list.

The merge layer was already built. The original search system already federated PostgreSQL full-text with ChromaDB semantic search and tagged each result with a source field. Adding a third source — telemetry_fts — meant slotting it into a pattern that already existed. From the API surface, there's now one query and three backends. The caller doesn't know or care.

The Numbers

I ran the backfill on every JSONL transcript in ~/.claude/projects/. Just the ones I had on this Mac. Just my own work, just the last few months.

The lake ingested 38,925 turns across 60 sessions, indexed and searchable in 85 megabytes. That covers 13,711 tool calls: 6,543 Bash invocations, 1,836 file reads, 1,635 file edits, 721 memory writes, 628 grep operations.

Token totals from the runs: 5.6 billion cache_read tokens, 191 million cache_creation tokens, 11.3 million output tokens, 173 thousand fresh input tokens.

Five-point-six billion. From one developer running AI tools on a single mini for a few months. The cache is doing roughly 40× the work of the cold path, which is exactly what you'd hope for and previously had no way to actually measure.

That number alone — being able to measure it — justifies the whole exercise.

Reversibility Was the Whole Point

Every migration ships in pairs. 005_create_lake_db.up.sql is matched by 005_create_lake_db.down.sql. 006_session_telemetry.up.sql is matched by 006_session_telemetry.down.sql. The down migrations are dead simple — DROP DATABASE coquina_lake WITH (FORCE) for the bootstrap, DROP TABLE for the schema — and they're tested by the up migrations being applied inside transactions and rolled back before any real run.

This wasn't an accident. The original design proposal got pushback the moment I described the schema landing in the existing Cortex database. "What if the migration was not irreversible?" The answer was: of course it shouldn't be. So I redesigned. Separate database means tear-down is one statement. Separate tablespace means the data files are physically separate. Separate connection pool means the application path is reversible — turn off the env var, the code degrades silently to single-store.

Reversibility isn't a virtue you bolt on. It's a property you bake in. You can tell which has happened by how nervous people are about touching the thing.

The Privacy Moment

I almost shipped a privacy bug.

The backfill script redacts secrets from tool inputs and outputs at parse time — Authorization: Bearer ..., sk-... tokens, JWT tokens, password fields, and so on. I wrote the regex patterns in what felt like a sensible order. Code review caught the bug in five minutes.

The Authorization header pattern stopped at the first whitespace. So Authorization: Bearer sk-1234567890abcdefghij got rewritten to Authorization: [REDACTED] sk-1234567890abcdefghij — the literal token, intact, in the output.

If that had shipped, I'd have spent today writing 38,925 of my own session turns into a database, and a small but non-zero number of those rows would have contained live API keys and bearer tokens. In cleartext. In a database I built specifically to make raw transcripts searchable from one query.

Two changes fixed it: reorder the patterns so token-shape matches (bearer\s+..., sk-..., JWT, etc.) run first, before the header pattern can catch them. And change the header pattern to consume the entire rest of the line ([^\r\n]+) instead of one word.

I tested it with seven canonical cases including the actual MCP bearer token I'm using right now. All redacted. Git SHAs and content hashes preserved (since they're useful for grepping). Then I committed.

Code review on every PR is a discipline I picked up from earlier mistakes. It paid for itself in one read.

What This Buys Me

It's now the case that when I ask a question of my Coquina layer — whether through MCP from Claude Code, through the REST API from a worker, or through the dashboard from a browser — the answer can come from either curated knowledge or raw conversation history.

Hidden inside the new federation: when I search "backfill scaffold telemetry," the system returns my curated memory about today's deploy and the literal turn from earlier in this exact thread where I wrote "## Done — scaffold landed and verified end-to-end." Both rank. Both surface. Both inform.

For an AI agent stack, that's a different kind of memory than what most people mean by "agent memory." It's not just "remember this fact for later." It's "remember the context in which the fact was established, the failed approaches you tried first, the exact sequence of tool calls that worked, and the token cost of finding it." It's evidence as well as opinion.

A 5.6-billion-token lake is just storage. The interesting thing is that it's searchable from the same place agents already search for what they know. The boundary between "what I think" and "what I've seen" gets thinner.

I think that's the part I'll keep coming back to.


This is part 7 of a series on building autonomous AI infrastructure on consumer hardware. The two-lake pattern is the one I'd recommend as soon as your raw transcript volume exceeds your curated memory volume — which, if you're using AI tools daily, is sooner than you think.