Semantic code search for local repositories.
- Zig CLI + HTTP API + MCP server
- Ollama embeddings (default:
bge-large, override withOLLAMA_MODEL) - sqlite-vec vector storage
- Hybrid search (vector + lexical)
- Symbol extraction: Zig, C/C++, TypeScript/JavaScript, Rust, Elixir, Bash, Lua, Nix, Nim, Lean, Idris, Haskell, Go, Ruby, Erlang, OCaml, Swift, LLVM IR, Clojure, Assembly
- LSP (references, rename): all of the above
- Markdown/text/log indexing with semantic chunking
# Run directly without installing
nix run github:pmarreck/codescan -- search "your query"
# Install to your profile
nix profile install github:pmarreck/codescan
# For faster downloads, add the garnix binary cache to /etc/nix/nix.conf:
# extra-substituters = https://cache.garnix.io
# extra-trusted-public-keys = cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g=Pre-built binaries for Linux (x86_64, arm64) and macOS (arm64) are available as artifacts from the latest CI build:
- Click the most recent successful run
- Scroll to the Artifacts section at the bottom
- Download the archive for your platform
- Extract and place
codescansomewhere on yourPATH
Note: GitHub requires you to be signed in to download workflow artifacts.
nix develop -c zig build -Doptimize=ReleaseFast./testnix develop -c ./tests/cli/test-cli
nix develop -c ./tests/http/test-http# requires Ollama running with bge-large pulled (or set OLLAMA_MODEL)
nix develop -c ./tests/integration/test-integration# requires act (https://github.com/nektos/act)
./scripts/ci-local# show or edit project config
codescan config
codescan config edit
# ReleaseFast builds are self-contained; no `nix develop` prefix needed to run.
# index
codescan index --root <path>
# update (full reindex)
codescan update --root <path>
# search
codescan search "hash functions" --root <path> --min-score 0.2
# default verb is search
codescan "hash functions" --root <path>
# show doc comments in human output
codescan search "hash functions" --root <path> --show-comments
# comment-only search (doc comments only)
codescan search "hash functions" --root <path> --comments
# include markdown/README when using default search scope
codescan search "design doc" --include-docs
# only markdown/README results
codescan search "design doc" --docs
# unified scope selector
codescan search "design doc" --scope docs
codescan search "hash functions" --scope comments
# restrict by extension/type/language
codescan search "checksum" --ext md,zig
codescan search "checksum" --type code,doc
codescan search "checksum" --lang zig
# filter by symbol kind (fn, struct, enum, const, var, test, mod, type, macro, ...)
codescan search "config" --kind struct
codescan search "init" --kind fn
codescan search "config" --kind const,var
# meta-kinds: declaration (const+var), definition (any defined symbol)
codescan search "config" --kind declaration
codescan search --kind definition --top 20
# browse mode: list symbols by kind without a text query
codescan search --kind fn --top 10
codescan search --kind struct
# filter by file path (glob) or exact file
codescan search "init" --path "src/storage*"
codescan search "hash" --file src/hash.zig
# regex search (PCRE2) with context lines
codescan search "pub fn \w+Init" --regex --context 5
codescan search "TODO|FIXME|HACK" --regex --top 20
codescan search "defer.*free" --regex --path "src/*.zig"
codescan search "fixme|todo" --regex -i # case-insensitive
codescan search "computeHash" --regex --include-body # show full symbol body containing match
# show uncommitted changes with hashlines (for safe editing from diff output)
codescan diff
codescan diff --staged
# index node_modules too
codescan index --include-node-modules
# show index and watcher status
codescan status
codescan status --json
# focused command help
codescan help search
codescan search --help
# stdin JSON request mode (auto-routed to CLI args, always emits JSON)
printf '{"action":"search","query":"checksum","mode":"lexical","db":".codescan/index.sqlite3"}\n' | codescan --jsonIf --root is omitted, codescan searches upward from the current directory for a .codescan/
directory and uses that as the root (otherwise it falls back to the current directory).
Search defaults to the primary code language by file count unless a filter is supplied.
Multi-word queries use OR semantics in lexical/hybrid search — results matching any term surface, with BM25 ranking results matching all terms higher.
--include-docs adds markdown/README; --docs/--only-docs restricts results to markdown/README only.
--comments/--only-comments restricts results to doc comments.
--scope <code|docs|comments|all> is a unified alias for common filter combinations.
Index/update defaults to code + docs unless --type/index_type is set.
Built-in ignores: .git/, .codescan/, .codescan-fixtures/, deps/, node_modules/ (opt-in), .zig-cache/, zig-cache/, .zig-out/, zig-out/ (see PROJECT_STATE for full list).
Human output uses ANSI colors by default; set NO_COLOR=1 to disable.
Interactive index/update shows a compact per-file progress counter on stderr (TTY only).
Set DEBUG=1 to emit verbose indexing progress to stderr.
codescan serve --root <path> --http-host 127.0.0.1 --http-port 8123Endpoints:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/help |
GET | List all endpoints |
/search |
POST | Semantic code search (/query is an alias) |
/index |
POST | Index/reindex repository |
/symbols |
POST | List or find symbols (/find-symbol is an alias) |
/replace-symbol |
POST | Replace a symbol's body |
/insert-after |
POST | Insert code after a symbol |
/insert-before |
POST | Insert code before a symbol |
/replace-lines |
POST | Replace hashline-validated line range |
/insert-at |
POST | Insert after hashline-validated line |
/replace-content |
POST | Find/replace text or regex |
/references |
POST | Find references via LSP |
/rename |
POST | Rename symbol via LSP |
/status |
GET | Index and watcher status |
# examples
curl -s localhost:8123/symbols -d '{"file":"src/main.zig"}'
curl -s localhost:8123/symbols -d '{"file":"src/main.zig","pattern":"runSearch","include_body":true}'
curl -s localhost:8123/symbols -d '{"file":["src/main.zig","src/cli.zig"],"pattern":"parse"}'
curl -s localhost:8123/symbols -d '{"pattern":"init"}'
curl -s localhost:8123/replace-content -d '{"file":"src/lib.zig","needle":"old","body":"new","all":true}'codescan includes an MCP server for direct LLM tool integration. It communicates via JSON-RPC 2.0 over stdio (newline-delimited).
codescan mcp-serve --root <path>Add to your MCP settings:
{
"mcpServers": {
"codescan": {
"command": "/path/to/codescan",
"args": ["mcp-serve", "--root", "/path/to/your/project"]
}
}
}Use an absolute binary path so startup does not depend on PATH:
codex mcp remove codescan
codex mcp add codescan -- /path/to/codescan mcp-serve --root /path/to/your/project
codex mcp get codescanIf you prefer command = "codescan" in ~/.codex/config.toml, ensure the app's
launch environment includes the directory that contains codescan.
MCP startup failed: No such file or directory (os error 2)usually means the MCP command could not be resolved.- Fix: configure an absolute binary path (recommended), or fix
PATHfor the app launch environment. - Verify with
codex mcp list/codex mcp get codescan.
| Tool | Description |
|---|---|
search |
Semantic code search (query is an alias). Params: query, kind, path, file, lang, top |
index |
Index/reindex repository |
symbols |
List or find symbols (optional file, pattern, include_body) |
replace_symbol |
Replace a symbol's body |
insert_after |
Insert code after a symbol |
insert_before |
Insert code before a symbol |
replace_lines |
Replace hashline-validated line range |
insert_at |
Insert after hashline-validated line |
replace_content |
Find/replace text or regex |
references |
Find references via LSP |
rename |
Rename symbol via LSP |
config |
Show configuration |
status |
Index and watcher status |
codescan provides structural editing commands for AI agents and scripts. All editing commands read replacement text from stdin.
Every codescan command that outputs source lines annotates them with a 3-character base-62 content-chain hash:
44:k7m|fn init(self: *Self) void {
45:r2p| self.count = 0;
46:a9x| self.buffer = undefined;
47:3bw| self.ready = false;
48:npq|}
Each hash incorporates the previous line's hash, forming a chain. If any line above
changes, all subsequent hashes cascade — so a stale line:hash reference is always
detected. This lets AI agents and scripts target exact line ranges without the silent
corruption risk of bare line numbers.
echo 'new_name' | codescan replace-content 'old_name' --file src/lib.zig
echo 'v2' | codescan replace-content 'v1' --file src/lib.zig --all
echo 'new impl' | codescan replace-content 'fn old\(.*?\)' --file src/lib.zig --regexecho 'new body' | codescan replace-symbol MyStruct/init --file src/lib.zig
echo 'new code' | codescan insert-after MyStruct --file src/lib.zig
echo 'new code' | codescan insert-before MyStruct --file src/lib.zigecho 'replacement' | codescan replace-lines --file src/lib.zig --from 45:r2p --to 47:3bw
echo 'new code' | codescan insert-at 42:abc --file src/lib.zigcodescan references MyFunc --file src/lib.zig
codescan rename MyFunc --file src/lib.zig --to newName [--dry-run]Create <root>/.codescan/config to override defaults. Example:
# output=json|human
output=human
# search tuning
search_mode=hybrid
weight_vector=0.7
weight_lexical=0.3
min_score=0.0
max_file_size=2097152
include_docs=false
docs_only=false
comments_only=false
include_node_modules=false
primary_lang=zig
index_ext=zig,md
index_type=code,doc
search_ext=zig
search_type=code
search_lang=zig
# Ollama model override (CLI flag or OLLAMA_MODEL env var also supported)
ollama_model=bge-large
# ignores
ignore=**/.git/**, **/.codescan/**
ignore.zig=**/.zig-cache/**,**/zig-out/**
Optional language-specific weight overrides live in <root>/.codescan/weights.toml:
[default]
weight_vector = 0.7
weight_lexical = 0.3
weight_symbol_kind = 0.0
weight_symbol_visibility = 0.0
weight_symbol_scope = 0.0
weight_symbol_arity = 0.0
[zig]
weight_vector = 0.55
weight_lexical = 0.45
weight_symbol_kind = 0.15
weight_symbol_visibility = 0.10When both are present:
- explicit CLI/HTTP weights win
- otherwise
weights.tomlapplies - otherwise
.codescan/configglobalweight_*applies
Metadata weights apply when the query includes metadata cues such as function, public, top-level, or arity 2.
codescan can replace Claude Code's built-in Read, Edit, Grep, and Glob tools with safer, hashline-validated alternatives. This is especially valuable for multi-agent workflows where concurrent file access can cause stale edits.
- Optimistic concurrency: Every
read-filereturns a 3-character version hash (the hashline of the last line — a content-addressed checksum of the entire file). Pass it back to any write tool via--version— if the file changed since your read, the edit fails cleanly instead of silently corrupting. - Hashline validation: Line-level edits use content-chain hashes that detect if the target lines have shifted since your last read.
- Structured search:
codescan search --kind fnreturns semantically relevant results instead of raw text matches, using fewer tokens. - Safe deletion:
destroy-filemoves files to the system trash (with undo support) instead of permanentrm.
Add a PreToolUse hook to ~/.claude/settings.json that nudges Claude toward codescan
tools in indexed projects:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Grep|Glob|Agent|Read",
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/codescan-redirect/redirect.sh",
"timeout": 5,
"statusMessage": "Checking codescan availability..."
}
]
}
]
}
}The hook checks if .codescan/ exists in the project and injects a context reminder to
use codescan's tools instead. It's non-blocking — the built-in tool still runs, but the
model learns to prefer codescan over time.
Add to ~/.claude/CLAUDE.md:
## Code Navigation: Prefer codescan
At the start of every session, run `codescan status` to check if the project is indexed
and the watcher is running. A running watcher means the index stays up to date
automatically.
When a `.codescan/` directory exists:
- Use `codescan search` / `codescan symbols` instead of Grep/Glob
- Use `codescan read-file` instead of Read (returns version hash for safe edits)
- Use `codescan replace-content --version <hash>` instead of Edit
- Use `codescan create-file` instead of Write (for new files)
- Use `codescan destroy-file` instead of rm (moves to trash)# 1. Read a file — get version hash
codescan read-file src/foo.zig --json
# → {"file":"src/foo.zig","version":"k7m","total_lines":50,...}
# 2. Edit with version check — prevents stale edits
codescan replace-content src/foo.zig "old text" --version k7m <<< "new text"
# → Replaced 1 occurrence (line 10:abc)
# → version: x9a
# 3. If another agent edited the file between steps 1 and 2:
# → error: file modified since last read (expected version k7m, current p3q) — re-read and retry- SQLite vector extension is statically linked (no runtime extension loading).
- On macOS, fully static userland binaries are not supported by the OS;
libSystemremains dynamic.
MIT. See LICENSE.