Why I Quit the Cloud (For AI Development)
3 min read
I spend $0/month on AI API calls. Zero. And my AI tools do more than most teams running five-figure monthly bills.
This isn't a frugality flex. It's an architecture decision with real technical advantages.
The Math
A Mac Mini M4 with 24GB for AI workloads. A Mac Mini 2018 (Intel i7, 64GB) for Docker services. A MacBook Air for portable work. A UGREEN NAS for storage. Multiple TB SSDs for model weights and training artifacts. All consumer hardware — no enterprise pricing, no cloud contracts.
The machines pay for themselves in months against equivalent cloud spend. After that, it's free compute forever. No vendor lock-in. No rate limits. No surprise billing.
The Performance Argument
The number that changed my mind: under 3 milliseconds.
That's full-text search latency on local PostgreSQL. The same query against a cloud-hosted database adds 50-200ms of network latency. For interactive agent use — where an AI tool queries memory mid-conversation — that's the difference between seamless and sluggish.
The auto-linking pipeline runs at write time. If it added 500ms of cloud round-trip on every memory store, I'd be choosing between fresh links and responsive writes. Running locally, I get both.
The Model Strategy
Running local doesn't mean running worse. My current lineup:
- Gemma 4 (9.6GB): Voice assistant, function calling, general tasks. 27+ tokens/sec on Apple Silicon with MLX.
- Qwen 3.5 (6.6GB): Overnight workers for code review and reasoning.
- Phi-4 Mini (2.5GB): Fast triage and lightweight classification. 30 tokens/sec.
These aren't as capable as Claude Opus for complex reasoning. I know that. I use Claude Code for complex development work. But for overnight batch processing, security reviews, article scanning, and voice commands? Local models are more than sufficient.
The Privacy Argument
My AI workers process git commits, security audit logs, personal calendar events, home automation states, and camera feeds. None of that should leave my network. With local models, it doesn't.
What I Still Use the Cloud For
I'm not a purist. Claude Code for complex development. GitHub for version control. Vercel for my website. Tailscale for VPN. The distinction: batch processing and personal data stay local. Interactive development and public-facing services use whatever's best.
The Local-First Future
Apple Silicon changed the economics. A consumer Mac Mini runs a 9B parameter model at interactive speeds. Ollama's MLX backend (0.20.5) doubled prefill speeds. The tooling is mature enough to be boring — which is exactly where infrastructure should be.
Local-first isn't about avoiding the cloud on principle. It's about building infrastructure where the physics of latency, privacy, and cost all point the same direction. For AI agent workloads — where memory queries happen mid-conversation and personal data flows through every pipeline — local is simply the better architecture. If you're paying five figures a month for AI APIs and wondering if there's another way, there is. I'm living in it.