If you've spent any time building with LLMs over the last year, you know The Context Problem.
You open Cursor or an AI chat, find a bug in auth_middleware.py, and start pasting
code. Then the AI needs database.py. Then it needs user_models.py. Before
long, you've pasted 10,000 lines, the AI is hallucinating because of the noise, and you're paying a
small fortune in API tokens to send the same boilerplate back and forth.
Today, I'm releasing CodeCortex — a standalone AI Project Indexer and MCP Server designed to give LLMs instant, structured knowledge of your codebase without the token tax.
Why Standard IDE Indexers Fail ("Lost in the Middle")
While built-in IDE indexers (like Cursor or VS Code) are fantastic for simple web apps, they usually rely on "naive chunking" for vector search. They blindly break your codebase into disjointed 500-line blocks.
When you ask a complex architectural question, the standard vector search returns 20 unorganized raw code snippets. The AI gets overwhelmed by the noise, leading to the "Lost in the Middle" phenomenon. It forgets the overarching structure and hallucinates.
IDE Naive Chunking vs CodeCortex Hierarchical Dual Indexing.
The Solution: Hierarchical Summarization
CodeCortex doesn't just copy-paste code. It thinks like a Senior Engineer reading a new repository. When you run the indexer, it uses Ollama (LLaMA 3.2) locally to analyze your files. It doesn't just read syntax; it generates abstract, architectural summaries of your large classes and functions.
The CodeCortex Pipeline: Hierarchical summarization meets vector search.
It then embeds both your raw code chunks and these intelligent summaries into a lightning-fast FAISS Vector Database. This creates a two-tier retrieval system: high-level architectural understanding and low-level code precision.
The Engine Room: Why These Models?
CodeCortex requires a balance of high-level reasoning and blazing-fast retrieval. It uses a dual-model approach optimized for local execution hardware:
The Architect (LLaMA 3.2) and The Indexer (MiniLM-L6-v2) working in tandem.
1. The Architect: LLaMA 3.2 (via Ollama)
- Role: Generates structural summaries and interprets code logic.
- Why LLaMA 3.2? It offers an incredible reasoning-to-VRAM ratio. It's smart enough to understand complex architecture but small enough (~2.2GB - 5GB) to run quietly in the background on most dedicated GPUs without bottlenecking your IDE.
- Can I use others? Yes. Because CodeCortex uses Ollama, you can
seamlessly swap to
qwen2.5-coder,mistral, orgemma2simply by passing a different model string to the indexer script.
2. The Indexer: all-MiniLM-L6-v2 (via HuggingFace)
- Role: Converts the raw code and the LLaMA summaries into dense vector embeddings for semantic search.
- Why MiniLM? It is absurdly fast and lightweight (~90MB). It runs completely on your CPU and can generate thousands of 384-dimensional semantic vectors in milliseconds. It excels at measuring semantic similarity, making it perfect for rapid code retrieval.
IDE Superpowers via MCP
CodeCortex includes a built-in Model Context Protocol (MCP) Server. By pointing
Cursor or Claude at the server, your AI assistant gains a search_code tool. It can
silently query your local database to find the exactly relevant context before it even starts
writing a line of code.
Features at a Glance
- 100% Local & Private: Your code never leaves your machine. No API keys required.
- Incremental Scanning: Only processes files that have changed, using fast BLAKE3 hashing.
- RAG-First Design: Built specifically to fight "context window bloat" in modern IDEs.
- Portable: A single Python script and a Batch file. Drop it in, run it, done.
Try it Today
CodeCortex is open source and ready for production. Stop pasting files and start building with an AI that actually understands your architecture.