The Architecture
A multi-node infrastructure running local LLMs, semantic memory, agent orchestration, and home automation. Everything here is in active use—not experiments.
Systems
Tools and infrastructure I design, build, and operate daily.
Coquina
Self-Healing Memory Platform for AI Agents (formerly Cortex)
Self-healing memory platform for AI agents. Dual-path search merges Postgres full-text and ChromaDB semantic vectors, ranked with recency bonus. A-MAC admission control quality-gates every write with five scoring signals. Auto-linker builds a relationship graph without manual curation.
Four self-evolution subsystems monitor embedding drift, score memory decay, refine retrieval through diffusion, and build graph consensus. Runs natively on Apple Silicon with embedded ChromaDB — no Docker, no zombie containers, self-healing at every tier.
Forge
Queue-Based Autonomous Worker System
I built this to run while I sleep. Event-driven worker system for orchestrating AI tasks on Apple M4 Silicon. Redis-backed queues coordinate GPU access, health monitoring, and overnight chain orchestration — all running autonomously via 41 macOS LaunchAgents.
Nightly 20-step DAG runs security review, code review, research scanning, ecosystem monitoring, knowledge consolidation, and morning briefings. Self-healing daemon scans every 60 seconds and kickstarts crashed services.
Sandbox builder generates complete iOS/macOS apps autonomously — Ollama writes Swift, xcodegen builds, error loop self-corrects.
Thalamus
MQTT Gateway to AI Memory
Bridges IoT and Home Assistant events into Coquina via MQTT. Sensor readings, automation triggers, and system heartbeats flow through the bus into persistent memory where agents can query them.
Nerve
MCP Server Registry & Guardian
Maintains a canonical registry of MCP server configurations. Detects config drift, verifies connectivity, and auto-restores missing servers. The structural safeguard that keeps the multi-agent ecosystem intact.
Forge Command Center
Real-Time System Monitoring
Live worker status, queue depths, chain execution history, and system health. React frontend backed by the Forge API with auto-refresh and task dispatch controls.
Hook Pipeline
6-Hook Auto-Persist System
Six hooks fire during every Claude Code session: context recall, pre-tool safety gate, pre-tool injection, Slack commit alerts, pre-compaction memory extraction, and session-end persistence. Ollama extracts structured memories from transcripts and stores them in Coquina automatically.
Local LLM Fleet
27 Models on M4 Metal
27 models via Ollama on Apple M4 Silicon. Three-model strategy: gemma4:e4b (27 tok/s, native function calling, voice assistant), qwen3.5:9b (deep reasoning, code generation, overnight workers), phi4-mini:3.8b (fast triage, 30 tok/s). 5 custom LoRA adapters. Powers everything from Home Assistant voice control to autonomous code generation — zero cloud API calls.
System Architecture
The homelab maps to brain structures—each system mirrors a cognitive function.
Philosophy
Everything here exists because I use it. Coquina runs in every coding session. Forge orchestrates overnight chains while I sleep. The homelab hosts real services my household depends on.
Most “AI-powered” features are demos pretending to be products. This is where I test what actually works under real constraints: latency, failure modes, cost, and whether anyone would use it twice.
I build infrastructure so I can recommend it honestly. Every system I suggest to a client is one I've operated myself.
Interested in this kind of infrastructure?
I build these systems for clients too. Let's talk about what you need.
Start a Conversation