Quick reference guide for understanding Qdrant collections and KiloCode workflow.
- Why Do I Need This? (RAG Explained)
- What's the difference between asking an LLM with and without RAG?
- Real Example Comparison
- What are the benefits of RAG?
- What are the downsides?
- Is this the same as "Custom GPTs" or "Claude Projects"?
RAG (Retrieval Augmented Generation) is the difference between semantic search vs. keyword matching/prompting.
Without RAG (Normal LLM):
You: "How does error handling work in this project?"
LLM has to:
- 📝 Rely on your prompting to identify what files to search for
- 🔤 Use exact terminology or better prompting techniques to identify relevant files
- 📖 Read files based on keyword matching (like grep/Ctrl+F) - requires knowing the exact terms used
- 🐌 Potentially inefficient - might read many files to find relevant code, wasting tokens
- 🤷 Might provide generic advice if can't identify the right files
LLM says: "Typically in Node.js projects, you use try/catch blocks or error middleware. Common patterns include..." (generic fallback, or reads 5+ files hoping to find relevant code)
The limitation: Both grep/keyword search and file-by-file reading are fast, but require you to know the exact terminology used in your codebase, or craft detailed prompts to guide the search.
With RAG (This Setup):
You: "How does error handling work in this project?"
- Semantic search FIRST - Finds code by meaning, not just keywords
- KiloCode searches the entire codebase in milliseconds using vector similarity
- Finds relevant code - Understands "error handling" relates to
ErrorHandler,try/catch,error middleware, even if you use different words - Top 50 matches sent to LLM (only relevant snippets, not entire files)
- LLM answers with context from YOUR actual code
LLM says: "Your project uses a custom ErrorHandler middleware in src/middleware/error.ts:23-45 that catches all Express errors. It logs to Winston and sends formatted JSON responses. Database errors are handled separately in src/db/connection.ts:89-102 with retry logic."
Result: Accurate, specific, fast. Semantic search understands meaning across different naming conventions.
The key difference: RAG's semantic search finds code by meaning (not just keywords), enabling natural language queries across different naming conventions. Without RAG, you need exact terminology or better prompting to identify what to search for.
Analogy: Like Google's semantic search vs. Ctrl+F/grep keyword matching - both find things fast, but semantic search understands meaning (finds "authentication" when you search "user verification"), while keyword search requires knowing the exact terms used in the code.
| Question | Without RAG | With RAG |
|---|---|---|
| "What database are we using?" | "Common options are PostgreSQL, MongoDB, MySQL, or SQLite..." | "PostgreSQL 14, configured in config/database.js:12 with connection pooling (max 20 connections)" |
| "Where's the login logic?" | "Usually in an auth controller or service. Look for files named auth, login, or user..." | "Login is in routes/auth.ts:34-67 using JWT tokens. It validates against User model in models/user.ts:89 and returns tokens from services/jwt.ts:23" |
| "How do we handle API errors?" | "Best practice is to use middleware that catches errors and returns consistent responses..." | "API errors use custom middleware in middleware/apiError.ts:12-45 that formats errors as {error: string, code: number} and logs to CloudWatch" |
The difference: Hallucination vs. Truth.
✅ Accuracy - Answers based on YOUR code, not assumptions
✅ No Hallucination - LLM can't make up code that doesn't exist
✅ Context-Aware - Understands your patterns, architecture, naming conventions
✅ Cites Sources - Points to exact files and line numbers
✅ Efficient - No manual file hunting or copy-pasting code into prompts
✅ Works with smaller/cheaper models - Even basic LLMs give great answers with good context
✅ Semantic Search - Finds code by meaning (e.g., "auth" finds "login", "credentials", "permissions")
✅ Cross-File Understanding - Finds related code across your entire project
Worth it? If you regularly ask questions about large codebases, absolutely yes.
Yes, exactly! But 100% local and free.
- ChatGPT "Custom GPTs" - Upload docs, ChatGPT indexes them (cloud, costs money)
- Claude Projects - Add knowledge, Claude uses it (cloud, limited size)
- This Setup - Same RAG technology, but:
- ✅ Runs locally (complete privacy)
- ✅ No file size limits
- ✅ No API costs
- ✅ Works with ANY LLM (not locked to one vendor)
- ✅ Unlimited queries
Same AI magic, your hardware, your control.
A collection is like a database table, but for vectors (embeddings). It stores:
- The 4096-number vectors from your code embeddings
- Metadata (file path, line numbers, code snippet text)
You need it because Qdrant needs a structured place to store and search your code embeddings efficiently.
Best practice: ONE collection per codebase/project
Each project/repo should have its own collection:
- ✅
my_webapp_codebase- for your web app project - ✅
my_api_codebase- for your API project - ✅
kilocode_codebase- for testing/this project
Why separate collections?
- Keeps different projects isolated
- Prevents mixing unrelated code in search results
- Better performance (searching smaller, focused datasets)
- Cleaner organization
NOT recommended: One giant collection for all projects - that would return irrelevant results from unrelated codebases!
Think of it like resolution or detail level for your code embeddings.
- Each code snippet becomes a vector of numbers
- More dimensions = more detail = better accuracy
- Qwen3-Embedding-8B-FP16 via Ollama outputs 4096 dimensions
- Qdrant needs to know "expect 4096 numbers per vector" to store them correctly
Analogy: Like telling a photo app "all images will be 3840x2160" - Qdrant needs to know the vector "size" upfront.
For this project: Use 4096 (what Qwen3-Embedding-8B outputs via Ollama)
Note: While Qwen3 supports Matryoshka embeddings (configurable 32-4096 dimensions), when using this model through Ollama it outputs the full 4096 for maximum quality. This requires no configuration and provides 100% model performance.
Distance metric = "How do we measure similarity between two vectors?"
Cosine measures the angle between vectors:
1.0= identical (0° angle, perfect match)0.0= completely different (90° angle, unrelated)
Why Cosine for code search?
- Focuses on direction (meaning/semantics) not magnitude
- Works great for embeddings from language models
- Industry standard for semantic search
- Captures conceptual similarity, not just word matching
Example: "authentication code" and "user login logic" point in similar directions (similar meaning) even though they use different words.
For this project: Always use Cosine distance metric
Per-Project Workflow:
-
Create collection (one-time per project)
- Collection name:
my_project_name - Vector size: 4096
- Distance: Cosine
- Collection name:
-
Configure KiloCode (one-time per project)
- Point to the collection name
- Set Qdrant URL:
http://localhost:6333
-
Open project in VS Code
- KiloCode automatically starts indexing
-
Work normally
- KiloCode monitors file changes
- Incremental updates happen automatically
-
Switch to different project?
- Create new collection or reconfigure KiloCode to different collection
Automatic! Once configured, KiloCode handles everything:
✅ Initial indexing - Runs when you first open the project ✅ File watching - Detects changes as you code ✅ Incremental updates - Only re-indexes what changed ✅ Hash-based caching - Skips unchanged files ✅ Git branch detection - Detects branch switches
You don't need to manually trigger re-indexing!
Smart Incremental Updates:
- File modified? Only that file's embeddings are updated
- File added? New embeddings created and added to collection
- File deleted? Corresponding embeddings removed
- Branch switch? KiloCode detects and re-indexes if needed
No full re-indexing required! KiloCode uses:
- File system watching (detects changes in real-time)
- Content hashing (knows what actually changed)
- Efficient partial updates (fast, ~seconds)
Performance:
- Initial index: Varies by codebase size (GPU-accelerated)
- Updates: Fast (only changed files)
Short answer: It replaces/overwrites old embeddings. You won't have duplicate or stale results.
How it works:
When you edit a file, KiloCode:
- Detects change - File watcher notices the save (GPU spins up briefly!)
- Deletes old vectors - Removes all embeddings for that file from Qdrant
- Re-parses file - Tree-sitter extracts updated code blocks
- Generates new embeddings - Ollama creates fresh vectors (this is the GPU activity)
- Inserts new vectors - Qdrant stores only the updated version
Example:
Original README.md: "Use 1024 dimensions"
→ Indexed with embeddings for that text
You edit README.md: "Use 4096 dimensions"
→ Old "1024 dimensions" vectors DELETED
→ New "4096 dimensions" vectors INSERTED
You search: "What dimensions should I use?"
→ Returns: "4096 dimensions" ✅ (current/correct)
→ NOT: Both "1024" and "4096" ❌ (would be wrong)
Vector tracking:
Each vector has metadata (payload) that identifies it:
{
"id": "unique-vector-id",
"vector": [4096 numbers...],
"payload": {
"file": "README.md",
"startLine": 42,
"endLine": 58,
"content": "actual code snippet",
"hash": "abc123..."
}
}KiloCode finds all vectors with matching file path, deletes them, and replaces with current content.
Result: Your index stays accurate and current automatically - no stale data!
You can verify this:
# Check total point count
curl -s http://localhost:6333/collections/ws-{your-id} | jq .result.points_count
# Edit a file and save (watch GPU briefly spin up)
# Check count again - it won't keep growing with duplicates
curl -s http://localhost:6333/collections/ws-{your-id} | jq .result.points_countThe count will change based on how block splitting works, but you'll never see both old and new versions in search results.
Morning (Project A):
1. Open Project A in VS Code
2. KiloCode: "Using collection: project_a_codebase"
3. Work, search, code normally
4. Auto-updates happen in background
Afternoon (Project B):
1. Close Project A, open Project B
2. Reconfigure KiloCode collection to: project_b_codebase
3. KiloCode indexes if first time, or uses existing index
4. Work, search, code normally
Key Point: You're always working with ONE project/workspace at a time in KiloCode.
Good News: KiloCode auto-creates collections when you start indexing! You don't need to manually create them.
Auto-created collection settings:
| Setting | Value | Why |
|---|---|---|
| Collection Name | ws-{workspace-id} |
Auto-generated based on workspace |
| Vector Size | 4096 |
Matches Ollama's Qwen3-Embedding-8B output |
| Distance Metric | Cosine |
Optimal for semantic similarity |
If manually creating (advanced users):
curl -X PUT http://localhost:6333/collections/your_project_name \
-H 'Content-Type: application/json' \
-d '{
"vectors": {
"size": 4096,
"distance": "Cosine"
}
}'Optional Advanced Settings:
- Indexing method: HNSW (default, fast search)
- Quantization: None (for maximum quality)
- On-disk storage: Enabled by default (handles large collections)
For most users: Just let KiloCode auto-create - it handles everything!
Yes! Collections can be managed through:
- Qdrant Dashboard: http://localhost:6333/dashboard
- API calls: See Qdrant documentation
- KiloCode: Just change the collection name in settings
Deleting a collection:
- Permanently removes all vectors/embeddings
- Your source code is NOT affected (vectors are just indexes)
- You can always re-index by creating a new collection
Best practice: Use descriptive names like my_ecommerce_app instead of generic names like codebase1
Cause: KiloCode is pointing to the wrong collection
Fix:
- Open VS Code settings
- Navigate to KiloCode → Codebase Indexing
- Check "Collection Name" matches your current project
- Update if needed
Check KiloCode status:
- VS Code status bar shows indexing status
- Green = indexed and ready
- Yellow = currently indexing
- Gray = not running
Check Qdrant Dashboard:
- http://localhost:6333/dashboard
- View all collections and their sizes
- See which one has recent activity
Recommendation: No (for most cases)
Why:
- Collections are just indexes (can be rebuilt)
- Source code is the source of truth
- Re-indexing is reasonably fast with GPU acceleration
However, if you want to backup:
- Collections are stored in Docker volume:
qdrant_storage - Can create snapshots via Qdrant API
- Useful for very large codebases (time-consuming to re-index)
KiloCode auto-creates collections - just start indexing!
- Configure KiloCode settings
- Click "Start Indexing"
- Collection created automatically with correct settings
Only if you want custom collection names:
- Open http://localhost:6333/dashboard
- Click "Collections" → "New Collection"
- Enter:
- Name:
your_project_name - Vector size:
4096 - Distance:
Cosine
- Name:
- Click "Create"
curl -X PUT http://localhost:6333/collections/your_project_name \
-H 'Content-Type: application/json' \
-d '{
"vectors": {
"size": 4096,
"distance": "Cosine"
}
}'VS Code Settings:
- Open Settings (Ctrl+,) or KiloCode → Codebase Indexing
- Configure:
- ✓ Enable Codebase Indexing
- Provider:
Ollama - Ollama base URL:
http://localhost:11434/ - Model:
qwen3-embedding:8b-fp16 - Model dimension:
4096 - Qdrant URL:
http://localhost:6333 - Qdrant API key: (leave empty for local)
- Max Results:
50(adjustable based on your needs) - Search score threshold:
0.40(default)
- Click Save
- Click Start Indexing (collection auto-created)
Your Codebase (VS Code)
↓ (File watching)
KiloCode Extension
↓ (Parse with Tree-sitter)
Code Blocks (functions, classes)
↓ (Ollama API)
Qwen3-Embedding-8B-FP16
↓ (4096-dim vectors)
Qdrant Collection
↓ (Cosine similarity search)
Search Results (relevant code snippets)
Data Flow:
- You write code → KiloCode detects changes
- KiloCode sends code to Ollama → Gets embeddings
- Embeddings stored in Qdrant collection → Ready for search
- You search "auth logic" → Qdrant finds similar vectors
- Results displayed in KiloCode → Jump to code
- ✅ Qdrant running (check dashboard)
- ⏭️ Create your first collection (
kilocode_codebasefor testing) - ⏭️ Configure KiloCode to use that collection
- ⏭️ Let it index your codebase
- ⏭️ Try semantic searches!
Need more help? Check the detailed guides:
- README.md - Project overview and quick start guide
- 1_CODEBASE_INDEXING_FEATURE.md - Codebase indexing feature documentation
- 2_EMBEDDING_MODEL_SELECTION.md - Embedding model selection and research
- 3_QWEN3_OLLAMA_GUIDE.md - Qwen3 with Ollama guide and FAQ
- 4_QDRANT_INSTALLATION_GUIDE.md - Step-by-step Qdrant deployment
- IMPLEMENTATION_NOTES.md - Lessons learned from actual setup