The Architecture
A multi-node infrastructure running local LLMs, semantic memory, agent orchestration, and home automation. Everything here is in active use—not experiments.
Systems
Tools and infrastructure I design, build, and operate daily.
Cortex
Self-Healing Memory Platform for AI Agents
Persistent, searchable, cross-session memory for AI agents. Dual-path search — Postgres full-text and ChromaDB semantic vectors — merged and ranked with recency bonus. A-MAC admission control quality-gates every write. Auto-linker builds a relationship graph without manual curation. Adaptive retrieval (Agentic RAG via LangGraph) reformulates queries when relevance is low. Runs natively on Apple Silicon with embedded ChromaDB — no Docker, no zombie containers, self-healing at process, application, and infrastructure tiers. Three-node architecture with warm standby.
Forge
Queue-Based Autonomous Worker System
Event-driven worker system for orchestrating AI tasks on local hardware. Redis-backed queues coordinate GPU access, LoRA adapter training, health monitoring, and overnight chain orchestration — all running autonomously via macOS LaunchAgents. Nightly chain runs security review, code review, PR digest, news curation, arXiv scanning, and morning briefings. Self-healing daemon scans every 60 seconds and kickstarts crashed services. Sandbox builder generates complete iOS/macOS apps autonomously — Ollama writes Swift, xcodegen builds projects, xcodebuild compiles, error loop self-corrects.
Thalamus
MQTT Gateway to AI Memory
Bridges IoT and Home Assistant events into Cortex via MQTT. Sensor readings, automation triggers, and system heartbeats flow through the bus into persistent memory where agents can query them.
Nerve
MCP Server Registry & Guardian
Maintains a canonical registry of MCP server configurations. Detects config drift, verifies connectivity, and auto-restores missing servers. The structural safeguard that keeps the multi-agent ecosystem intact.
Forge Command Center
Real-Time System Monitoring
Live worker status, queue depths, chain execution history, and system health. React frontend backed by the Forge API with auto-refresh and task dispatch controls.
Hook Pipeline
5-Stage Auto-Persist System
Five hooks fire during every Claude Code session: context recall, pre-tool injection, Slack commit alerts, pre-compaction memory extraction, and session-end persistence. Ollama extracts structured memories from transcripts and stores them in Cortex automatically.
Local LLM Fleet
24 Models on M4 Metal
24 models via Ollama on Apple M4 Silicon. Three-model strategy: gemma4:e4b (27 tok/s, native function calling, voice assistant), qwen3.5:9b (deep reasoning, code generation, overnight workers), phi4-mini:3.8b (fast triage, 30 tok/s). 5 custom LoRA adapters. Powers everything from Home Assistant voice control to autonomous code generation — zero cloud API calls.
System Architecture
The homelab maps to brain structures—each system mirrors a cognitive function.
Philosophy
Everything here exists because I use it. Cortex runs in every coding session. Forge orchestrates overnight chains while I sleep. The homelab hosts real services my household depends on.
Most “AI-powered” features are demos pretending to be products. This is where I test what actually works under real constraints: latency, failure modes, cost, and whether anyone would use it twice.
I build infrastructure so I can recommend it honestly. Every system I suggest to a client is one I've operated myself.
Interested in this kind of infrastructure?
I build these systems for clients too. Let's talk about what you need.
Start a Conversation