I've been watching AI write code for two years now. And I've noticed something that nobody talks about enough: the model is almost never the bottleneck. The context is.
Claude Opus can reason through algorithms that would take me an hour. It can refactor a 500-line function into clean, testable modules in seconds. But ask it to add a feature to a real codebase — one with 200+ files, custom conventions, internal APIs, and six months of accumulated decisions — and it sometimes struggles. Not because it's dumb (hallucinations have reduced incredibly in the last few years). Because it's blind.
What is TaskAI?
TaskAI is something I've been building for the past year — an AI-native project management system designed from the ground up around one idea: AI agents should be first-class participants in your development workflow, not afterthoughts.
It has the things you'd expect — kanban boards, sprint planning, task management. But the pieces that make it different are the ones I built specifically for AI:
A collaborative wiki — real-time markdown editing (powered by Yjs CRDTs) where teams document architecture decisions, API contracts, deployment guides, and the kind of tribal knowledge that usually lives in someone's head or a forgotten Slack thread.
An MCP server — so AI agents like Claude Code can create tasks, search documentation, read wiki pages, and update project state through tool calls. Not a chat integration. Actual tool-use over a structured protocol.
A built-in drawing canvas — for architecture diagrams, flow charts, and system designs that live alongside the documentation, not in a separate tool.
GitHub two-way sync — issues and PRs flow between GitHub and TaskAI automatically, so the AI sees the full picture.
I've been using it daily with Claude Code on multiple projects. And the thing I kept running into wasn't a TaskAI problem or a Claude problem — it was a knowledge retrieval problem. The documentation was there. The AI just couldn't find the right parts at the right time.
That's the story behind what I'm calling the Knowledge Spine, and why I think it matters more than any model upgrade.
The Needle Nobody Can Find
Here's a scenario every developer using AI has experienced. You're working on a Go backend. You ask Claude to "add rate limiting to the login endpoint." The model generates clean, functional code. It uses the golang.org/x/time/rate package, creates a middleware, applies it to the route.
One problem: your project already has a rate limiter. It lives in internal/middleware/ratelimit.go. It uses a custom token bucket backed by Redis. It has tests. It follows your team's error response format. The AI didn't know any of this.
This isn't a context window problem. Your codebase might fit comfortably within Claude's 1M token window. But dumping a million tokens into a prompt is like giving someone a library card and asking them to write a book report in 30 seconds. The information is there. It's just not findable.
Greg Kamradt's Needle in a Haystack test proved this empirically. Even models that claim 200K+ context degrade badly when the relevant information sits in the middle of a long input. The "lost in the middle" phenomenon is real, it's measured, and it gets worse as context grows.[1]
What Actually Determines Code Quality
After months of using Claude Code on the TaskAI codebase daily, I noticed a pattern:
The quality of AI-generated code is almost entirely determined by the quality of context it receives.
Not the model size. Not the temperature. Not the system prompt. The context.
When I manually paste the right files into the conversation — the handler I'm extending, the middleware it depends on, the test patterns we use — the output is indistinguishable from what I'd write myself. When I don't, it's a coin flip.
This is what I call the Knowledge Spine problem. Every codebase has one — a structural backbone of conventions, patterns, APIs, and decisions that make code "fit." Humans absorb it over weeks of onboarding. AI needs it served on a platter, in exactly the right moment, for exactly the right query.
The Approach Everyone Tries (And Why It Fails)
The obvious solution is to stuff more code into the context window. Cursor does this by indexing your workspace. Sourcegraph Cody embeds entire repositories. Augment Code reports processing 400K+ files through semantic dependency analysis.[2]
These are all doing the right thing. But they're doing it at the wrong layer.
Think about how you work on a large codebase. You don't re-read the entire repo every time you make a change. You have a mental model — built from architecture docs, team wikis, PR review comments, and that one Slack message from three months ago where someone explained why the auth middleware works the way it does. That mental model is your personal Knowledge Spine.
Developer tools are trying to build this from code alone. Parse the AST, follow imports, rank by file proximity. It works for small projects. It fails spectacularly on anything with real architecture, because architecture decisions aren't in the code. They're in the heads of the people who wrote it. Or, if you're lucky, in documentation.
Building a Knowledge Spine (Not a Bigger Brain)
Here's what I did instead.
TaskAI's wiki already had sixty-five pages of accumulated project knowledge — architecture decisions, API contracts, deployment pipelines, design rationale. The stuff that normally lives in Notion, Confluence, or someone's memory. And the MCP server already had a search_wiki tool so AI agents could search it.
But the search was ILIKE — PostgreSQL's case-insensitive substring matching. No ranking, no semantic understanding. Searching "how does deployment work" returned nothing because no wiki page contains that literal string.
So I built a Knowledge Spine. Three components, all running on a single server, all local, zero external API calls:
1. Ollama sidecar — A tiny embedding model (all-MiniLM-L6-v2, 45MB) running as a Docker container. It converts text into 384-dimensional vectors. No GPU needed. About 150MB of RAM and 10 milliseconds per embedding.
2. pgvector — PostgreSQL extension for vector similarity search. An HNSW index on the embedding column gives us approximate nearest-neighbor search in sub-millisecond time.
3. The embedding pipeline — Every two minutes, a background worker scans for updated wiki pages, splits them into blocks by heading, and embeds each block. When an AI agent searches, the query gets embedded the same way, and cosine similarity finds the most relevant blocks — regardless of wording.
The entire system cost me about a day to build. The migration is 6 lines of SQL. The embedding client is 130 lines of Go. The search handler supports four modes — keyword, full-text, semantic, and hybrid — selectable via a single mode parameter on the MCP tool.
Why Hybrid Search Changes Everything
Here's the thing about vector search that most articles get wrong: semantic search alone isn't enough. If you search "OAuth2" and the document says "OAuth2", you want an exact match, not a semantic approximation. But if you search "how do we handle user login" and the document is titled "Authentication & Authorization Flow", you need semantic understanding.
The answer is hybrid search. We run both full-text (PostgreSQL's built-in tsvector with GIN indexing) and vector search in parallel, then merge the results using Reciprocal Rank Fusion.[3]
RRF is elegant because it works on rank positions, not raw scores. No normalization needed. A document ranked #1 by both systems gets a higher fused score than one ranked #1 by one system and #50 by the other:
$$\text{score}(d) = \frac{1}{k + r_{\text{fts}}(d)} + \frac{1}{k + r_{\text{vec}}(d)}, \quad k = 60$$where $r_{\text{fts}}(d)$ and $r_{\text{vec}}(d)$ are the rank positions from full-text and vector search respectively. The constant $k = 60$ dampens the contribution of low-ranked results.
This is the same approach used by Elasticsearch, Weaviate, and Vespa in their hybrid search implementations. It works remarkably well.
The MCP Connection
All of this is exposed through Model Context Protocol — Anthropic's open standard for connecting AI applications to external tools and data.[4] When Claude Code is connected to TaskAI's MCP server, it gets a single tool: search_wiki. One parameter for the query, one for the search mode.
The AI doesn't know it's querying a vector database. It doesn't need to. It asks "how does the deployment pipeline work?" and gets back the three most relevant wiki sections, ranked by semantic relevance, in under 50 milliseconds. Then it writes code that actually follows your deployment patterns.
This is why I care about MCP more than I care about model improvements. A smarter model with bad context produces bad code confidently. A decent model with great context produces code that fits your codebase like it was written by someone who's been on the team for six months.
What I Learned from MemPalace
I want to credit MemPalace, an open-source local-first AI memory system, for several ideas that influenced this design.[5]
Their hierarchical scoping concept — organizing memory into Wings, Rooms, and Drawers instead of flat-searching an entire corpus — directly inspired our project-scoped search. An AI agent searching the TaskAI wiki only sees pages from projects they have access to, not the entire knowledge base.
Their commitment to verbatim preservation (storing original content, not lossy summaries) validated our approach of embedding full text blocks rather than generated summaries.
And their zero-API-dependency architecture (96.6% recall@5 on the LongMemEval benchmark, entirely local) pushed us toward Ollama instead of calling OpenAI's embedding API. Everything runs on a single AMD EPYC server. No API keys. No network dependency. No usage bills.
Where we diverge: MemPalace is designed for conversation memory. Our Knowledge Spine is designed for structured project knowledge — architecture docs, API references, deployment guides. Different problem, same principle.
The Real Numbers
Here's what's running in production right now on taskai.cc:
| Component | Detail |
|---|---|
| Wiki pages | 65 |
| Indexed blocks | 1,052 |
| Embedded blocks | 1,034 (98.3%) |
| Embedding model | all-MiniLM-L6-v2 (384 dim) |
| Model size | 45MB on disk |
| Ollama RAM | ~150MB |
| Embedding latency | ~10ms per block |
| Search latency | <50ms end-to-end |
| Server | Single AMD EPYC, 31GB RAM |
| Infrastructure cost | Zero additional (runs on existing server) |
The Ollama container uses 0.5% of the server's available RAM. The pgvector HNSW index adds negligible storage. The embedding pipeline runs in the background every two minutes and is invisible to users.
Why This Matters Beyond Opus 4.7
Models will keep getting smarter. Context windows will keep growing. But the fundamental problem doesn't change: an LLM can only write good code if it understands the context that code lives in.
A 10M token context window doesn't solve this. You can't index a large codebase by dumping it all into a prompt, and even if you could, the "lost in the middle" problem means the model would still miss the relevant parts. What you need is a system that understands what's relevant to the current question and retrieves exactly that.
That's what a Knowledge Spine does. It's not a replacement for bigger models or longer contexts. It's the layer that makes them actually useful.
The knowledge management market is projected to reach 32 billion dollars by 2030, driven largely by AI integration.[6] But most of that is about human knowledge management. What I'm arguing is that the same infrastructure — wikis, documentation, structured knowledge — is now AI infrastructure too.
Your knowledge base is no longer "nice to have" documentation. It's the difference between an AI that writes broken code and one that writes production-quality code. Every wiki page you maintain, every architecture decision you document, every API contract you write down — it's all training data for the next time an AI agent touches your codebase.
Try It Yourself
TaskAI is live. The MCP server is at mcp.taskai.cc. Connect Claude Code, ask it to search your project wiki, and watch what happens when the AI actually knows your codebase.
The Knowledge Spine is open and inspectable — you can see the source code, the migration, and the embedding client. The entire vector search implementation is about 500 lines of Go.
Because the next frontier of AI-assisted development isn't smarter models. It's smarter context.
This is Part 1 of a series on building AI-native development tools. Next: how to build feedback loops that let LLMs improve their own Knowledge Spine over time.
References
- ↩Liu et al. — Lost in the Middle: How Language Models Use Long Contexts (2023). Demonstrated that LLM performance degrades significantly when relevant information is in the middle of the input, even when the context window technically supports the length.
- ↩Augment Code — Context Engine Architecture (2025). Reports 70.6% on SWE-bench vs GitHub Copilot's 54%, attributing the gap primarily to context quality rather than model capability.
- ↩Cormack, Clarke & Buettcher — Reciprocal Rank Fusion outperforms Condorcet and individual rank learning methods (SIGIR 2009). The foundational paper on RRF, showing that simple rank-based fusion consistently outperforms more sophisticated methods.
- ↩Anthropic — Introducing the Model Context Protocol (2024). The open standard for connecting AI applications to external tools and data sources, now maintained by the Linux Foundation.
- ↩MemPalace — Local-first AI memory system (2025). Achieves 96.6% recall@5 on LongMemEval entirely locally. Hierarchical scoping (Wings/Rooms/Drawers) and verbatim preservation are key design principles.
- ↩TTMS — AI Tools for Knowledge Management (2025). Knowledge management software market valued at 13.70B in 2025, projected to reach 32.15B by 2030 (18.6% CAGR).
Loading comments...