From Side Project to Product
3 min read
I didn't set out to build a product. I set out to make my AI tools stop forgetting things.
Months later, I have a working auto-data lake with 14,000+ auto-generated knowledge graph edges, an autonomous overnight pipeline, an MCP-native protocol layer, and a reinforcement learning system that improves the graph every night.
The Origin
Every developer using AI tools hits the same wall. You have a great session with Claude Code. You make architecture decisions, debug a tricky issue, settle on a naming convention. Then you close the terminal. Next session, all of that context is gone.
I tried the usual workarounds. CLAUDE.md files. Copy-pasting decisions into prompts. Wiki pages that went stale within a week. None of it scaled. None of it worked across tools.
So I built Cortex. A place to store memories and get them back.
The Accidental Architecture
The first version was a FastAPI endpoint with PostgreSQL. Store a memory, search for it later. Basic.
Then I added ChromaDB for semantic search. Then auto-linking — every new memory triggers a neighbor analysis. Suddenly I had a knowledge graph that built itself.
Then I added MCP support. Now any tool that speaks MCP could connect. Then I mapped the whole thing to neuroscience. Cortex became the hippocampus. The event router became the thalamus. The security scanner became the amygdala.
At some point I looked up and realized I'd built infrastructure. Not a project. Not a toy. Infrastructure.
The Dixon Connection
James Dixon coined the term "data lake" in 2010 at Pentaho. Store data raw. Impose structure at read time.
What I built is the next step: an auto-data lake. Same philosophy — store raw, structure on read. But the lake does work between write and read. It auto-embeds, auto-links, and auto-clusters. By read time, the structure has already emerged.
What Grain Studios Is
Grain Studios is the company name. The product is Cortex. The positioning is infrastructure for AI agent memory.
Right now it's a single-tenant system running on my hardware. The architecture already supports multi-tenancy — project isolation, API key scoping, per-agent attribution. The product question is: how do you take a working auto-data lake with real performance numbers and turn it into something every team running AI agents can deploy?
The Competitive Landscape
The memory problem is real and the market is forming:
- Anthropic shipped AutoDream — memory consolidation for Claude Code. Validates the problem but it's local-only, single-user, single-tool.
- Google adopted MCP for Colab GPU runtimes, confirming MCP as the interop standard.
- LangChain launched Deep Agents with AGENTS.md and agent-to-agent protocols.
- Open-source attempts (COGWRAP, Exocomp) are solving individual pieces.
Cortex is infrastructure-shaped. Raw memory, auto-linking, reinforcement learning, schema-on-read, with the orchestration layer already running autonomously. Different depth.
Lessons
Build for yourself first. I built Cortex because I needed it. Every feature exists because I hit a wall without it.
Let the architecture emerge. The neuroscience mapping emerged from constraints, not planning.
Run it in production before pitching it. 14,000+ auto-generated edges at sub-3ms search is a stronger statement than projections.
The infrastructure layer is underserved. Everyone is building AI agents. Almost nobody is building the infrastructure that agents need. Memory, orchestration, integrity enforcement, GPU coordination — the boring-but-critical layers that every agent stack eventually needs.
If you're running AI agents and feeling the memory problem, I'd love to hear from you: reed@grainlabs.io