Skip to content

SqliteStore backend + annotation, audit, and query result cache tools#169

Merged
data-douser merged 12 commits intomainfrom
dd/sqlite-annotation-cache
Mar 30, 2026
Merged

SqliteStore backend + annotation, audit, and query result cache tools#169
data-douser merged 12 commits intomainfrom
dd/sqlite-annotation-cache

Conversation

@data-douser
Copy link
Copy Markdown
Collaborator

@data-douser data-douser commented Mar 25, 2026

Resolves #165.

Relates to:

Purpose

Replace lowdb (JSON file persistence) with sql.js (SQLite compiled to asm.js) as the unified storage backend, and introduce three new opt-in tool suites — annotation, audit, and query result cache — gated by ENABLE_ANNOTATION_TOOLS. The sql.js backend bundles inline with zero external native dependencies, eliminating the portability and corruption issues inherent to the previous JSON-file approach while enabling structured querying, full-text search, and transactional writes across all persisted state.

Additionally, extract performance-critical resolution and processing logic into dedicated modules (database-resolver, query-resolver, result-processor, codeql-version) to reduce the size of monolithic files, add in-memory caching with mtime-based invalidation, and track actual-vs-target CodeQL CLI version mismatches at startup.

Summary of Changes

Storage backend migration — New SqliteStore class (server/src/lib/sqlite-store.ts) backed by sql.js with sessions, annotations, and query_result_cache tables. SessionDataManager migrated from lowdb to SqliteStore; the lowdb dependency is removed and replaced with sql.js in server/package.json.

Annotation tools (6, opt-in)annotation_{create,get,list,update,delete,search} for general-purpose notes and bookmarks (server/src/tools/annotation-tools.ts).

Audit tools (4, opt-in)audit_{store_findings,list_findings,add_notes,clear_repo} for repo-keyed finding management during MRVA triage (server/src/tools/audit-tools.ts).

Query result cache tools (4, opt-in)query_results_cache_{lookup,retrieve,clear,compare} with subset retrieval and cross-database comparison (server/src/tools/cache-tools.ts). Results are auto-cached in processQueryRunResults after SARIF interpretation.

Performance improvements (always-on) — LRU cache for extractQueryMetadata with mtime invalidation; module-level Map cache for resolveDatabasePath; deduplicated resolveQueryPath calls by passing resolved paths through the pipeline.

Refactored modules — Extracted database-resolver.ts, query-resolver.ts, result-processor.ts, and codeql-version.ts from cli-tool-registry.ts and query-results-evaluator.ts, reducing cli-tool-registry.ts by ~375 lines.

Build configuration — Updated server/esbuild.config.js to handle sql.js bundling (wasm exclusion, asm.js inlining).

Tests — 500-line sqlite-store.test.ts, plus dedicated test suites for annotation-tools (307 lines), audit-tools (287 lines), and cache-tools (354 lines). Updated session-data-manager.test.ts and monitoring-tools.test.ts for the new backend. Integration test primitives added for all 14 new tools plus an MRVA triage workflow end-to-end test. VS Code extension e2e test coverage added (mcp-tool-e2e.integration.test.ts).

Outline of Changes

Core infrastructure improvements:

  • Added a new codeql-version module to track both the actual CodeQL CLI version detected at runtime and the target version from .codeql-version, with warnings logged on mismatches. This supports better diagnostics and cache key management. (server/src/lib/codeql-version.ts, server/src/lib/cli-executor.ts) [1] [2] [3] [4]
  • Implemented a new database-resolver utility for resolving database paths, including support for multi-language database roots and in-memory caching to avoid redundant filesystem scans. (server/src/lib/database-resolver.ts)
  • Added a query-resolver module that resolves query names and languages to file paths using the CodeQL CLI, with robust error handling and logging. (server/src/lib/query-resolver.ts)

Performance and caching:

  • Introduced an LRU in-memory cache for extracted query metadata in extractQueryMetadata, keyed by file path and file modification time, to avoid unnecessary file reads and improve performance. (server/src/lib/query-results-evaluator.ts) [1] [2]
  • Enhanced file reading for metadata extraction to use file descriptors and modification time checks, reducing race conditions and improving cache invalidation. (server/src/lib/query-results-evaluator.ts) [1] [2]

Server extensibility:

  • Registered new tool modules for annotation, audit, and cache management in the MCP server, enabling new features for notes, bookmarks, security audit state tracking, and query results caching. (server/src/codeql-development-mcp-server.ts) [1] [2]

Build and dependency updates:

  • Updated the build configuration to address issues with bundling sql.js and switched from lowdb to sql.js as a dependency for improved storage support. (server/esbuild.config.js, server/package.json) [1] [2] [3]

Copilot AI review requested due to automatic review settings March 25, 2026 11:53
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 2f9969a.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

server/package.json

PackageVersionLicenseIssue Type
sql.js^1.14.1NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
npm/sql.js 1.14.1 🟢 3.4
Details
CheckScoreReason
Code-Review⚠️ 2Found 6/30 approved changesets -- score normalized to 2
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Maintained🟢 43 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 4
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies🟢 6dependency not pinned by hash detected -- score normalized to 6
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
License🟢 9license file detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Signed-Releases⚠️ 0Project has not signed or included provenance with any releases.
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
npm/sql.js ^1.14.1 UnknownUnknown

Scanned Files

  • package-lock.json
  • server/package.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the server’s persistence layer from lowdb to a unified sql.js-backed SqliteStore, and introduces opt-in MCP tools for annotations, audit workflows, and query result caching while refactoring CLI-related logic into smaller modules.

Changes:

  • Replace lowdb session persistence with SqliteStore (sessions + annotations + query result cache tables).
  • Add new opt-in MCP tools: annotation_*, audit_*, and query_results_cache_*.
  • Refactor query/database resolution and query-result processing (interpretation + auto-caching) into dedicated modules.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
server/test/src/tools/monitoring-tools.test.ts Updates monitoring-tool tests to account for async store initialization and new config flag.
server/test/src/lib/sqlite-store.test.ts Adds unit tests for SqliteStore sessions, annotations, persistence, and cache behaviors.
server/test/src/lib/session-data-manager.test.ts Updates tests to await async initialization after migrating persistence backend.
server/src/types/sql-js.d.ts Adds minimal typings for sql.js asm.js import + core DB surface.
server/src/types/monitoring.ts Adds enableAnnotationTools config flag (default false).
server/src/tools/cache-tools.ts Introduces query_results_cache_* tools (lookup/retrieve/clear/compare).
server/src/tools/audit-tools.ts Introduces audit_* tools layered on annotations.
server/src/tools/annotation-tools.ts Introduces annotation_* tools for CRUD + search.
server/src/lib/sqlite-store.ts Implements the new unified SQLite persistence backend + cache subset retrieval.
server/src/lib/session-data-manager.ts Migrates session persistence from lowdb to SqliteStore; adds getStore().
server/src/lib/result-processor.ts Extracts query result interpretation/evaluation and auto-cache pipeline.
server/src/lib/query-results-evaluator.ts Adds query metadata caching with mtime-based invalidation.
server/src/lib/query-resolver.ts Extracts query-path resolution via codeql resolve queries.
server/src/lib/database-resolver.ts Extracts database-path resolution and caches results in-memory.
server/src/lib/codeql-version.ts Adds target/actual CodeQL version tracking and mismatch warning.
server/src/lib/cli-tool-registry.ts Integrates extracted resolvers/result processor and fixes output propagation + predicate handling.
server/src/lib/cli-executor.ts Wires version detection into startup validation; re-exports version helpers.
server/src/codeql-development-mcp-server.ts Registers new annotation/audit/cache tools at startup and initializes session manager.
server/package.json Removes lowdb, adds sql.js.
package-lock.json Updates lockfile to reflect dependency swap (remove lowdb/steno, add sql.js).
Comments suppressed due to low confidence (2)

server/src/lib/sqlite-store.ts:425

  • updateAnnotation sets updated_at = datetime('now'), which produces a different string format than the ISO timestamps written in createAnnotation. This can cause incorrect ordering when sorting by updated_at. Consider updating this to use the same format as inserts.
    setClauses.push("updated_at = datetime('now')");

    const db = this.ensureDb();
    db.run(
      `UPDATE annotations SET ${setClauses.join(', ')} WHERE id = $id`,
      params as Record<string, string | number | null>,
    );

server/src/lib/session-data-manager.ts:49

  • Docstring says “sql.js WASM init is async”, but this store uses the asm.js build. Please adjust the wording so it’s accurate (async init is still true, but it’s not specifically WASM).
  /**
   * Initialize the database and ensure it's properly set up.
   * Must be awaited before any session operations (sql.js WASM init is async).
   */

data-douser added a commit that referenced this pull request Mar 27, 2026
- Fix TOCTOU in query-results-evaluator (openSync/fstatSync/readFileSync(fd))
- Use datetime('now') consistently for annotation timestamps
- Debounce flush() with 200ms coalescing via scheduleFlush()
- Fix resultIndices to inclusive [start, end] range with clamping
- Fix WASM→asm.js comments in session-data-manager
- Fix audit-tools header comment (no separate ENABLE_AUDIT_TOOLS flag)
- Add separate maxResults parameter for SARIF in cache-tools
- Use createProjectTempDir() in all test files
- Fix monitoring-tools test init order (mock before initialize)
- Add store.close() in session-data-manager test afterEach
Copilot AI review requested due to automatic review settings March 27, 2026 01:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 23 changed files in this pull request and generated 6 comments.

data-douser added a commit that referenced this pull request Mar 27, 2026
- Bound metadataCache to 256 entries with oldest-first eviction
- Make SqliteStore.initialize() idempotent (close existing db first)
- Fix TOCTOU in initialize(): try readFileSync directly instead of existsSync
- Atomic flush: write to temp file + renameSync to prevent corruption
- Clarify annotation_search uses substring LIKE matching, not FTS
- Close store in monitoring-tools test afterEach to prevent timer leaks
@data-douser data-douser requested a review from Copilot March 27, 2026 02:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 23 changed files in this pull request and generated 5 comments.

@data-douser
Copy link
Copy Markdown
Collaborator Author

@copilot apply changes based on the comments in this thread

Copilot AI added a commit that referenced this pull request Mar 27, 2026
…fallback, JSON.parse fallback

Agent-Logs-Url: https://github.com/advanced-security/codeql-development-mcp-server/sessions/7ff5a5d7-9ac1-4964-8b54-e233d7253301

Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 27, 2026

@copilot apply changes based on the comments in this thread

All changes from that review thread have been applied in commit 5e57b03:

  • FD leakextractQueryMetadata now uses try/finally with closeSync(fd) to ensure the fd is always released.
  • Duplicate fs imports — merged into a single alphabetically-ordered import.
  • Windows renameSync — flush now catches rename errors and falls back to writeFileSync directly on the target.
  • FTS4 for annotations — full-text search replaces LIKE (see other reply).
  • JSON.parse fallbackquery_results_cache_retrieve wraps JSON.parse in try/catch and returns raw text when the content is not valid JSON.

Copilot AI review requested due to automatic review settings March 27, 2026 13:44
@data-douser data-douser self-assigned this Mar 27, 2026
@data-douser data-douser added the enhancement New feature or request label Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 23 changed files in this pull request and generated 2 comments.

@data-douser
Copy link
Copy Markdown
Collaborator Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 27, 2026

@copilot apply changes based on the comments in this thread

Applied in commit 6fbfa7e:

  • annotation_search: renamed querysearch, updated description to FTS semantics (token-based MATCH, * prefix matching, case-insensitive via unicode61 tokenizer).
  • query_results_cache_retrieve: removed grep entirely; SARIF format now always routes through getCacheSarifSubset (with maxResults) regardless of whether resultIndices/fileFilter are provided.

data-douser added a commit that referenced this pull request Mar 27, 2026
- Fix TOCTOU in query-results-evaluator (openSync/fstatSync/readFileSync(fd))
- Use datetime('now') consistently for annotation timestamps
- Debounce flush() with 200ms coalescing via scheduleFlush()
- Fix resultIndices to inclusive [start, end] range with clamping
- Fix WASM→asm.js comments in session-data-manager
- Fix audit-tools header comment (no separate ENABLE_AUDIT_TOOLS flag)
- Add separate maxResults parameter for SARIF in cache-tools
- Use createProjectTempDir() in all test files
- Fix monitoring-tools test init order (mock before initialize)
- Add store.close() in session-data-manager test afterEach
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 23 changed files in this pull request and generated 10 comments.

Comments suppressed due to low confidence (1)

server/src/tools/annotation-tools.ts:123

  • annotation_update accepts any number for id, including non-integers and negatives, but annotations.id is an integer primary key. Tighten the schema to an integer (and typically positive) to avoid confusing “not found” behavior for invalid inputs.
      id: z.number().describe('The annotation ID to update.'),
      content: z.string().optional().describe('New content (replaces existing).'),
      label: z.string().optional().describe('New label (replaces existing).'),

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 26 changed files in this pull request and generated 10 comments.

Comments suppressed due to low confidence (1)

server/test/src/tools/monitoring-tools.test.ts:227

  • This test mocks sessionDataManager.getConfig() to set storageLocation: testStorageDir, but the singleton SessionDataManager’s underlying SqliteStore is created from its constructor config and won’t follow the mocked storage location. That makes the test directory setup/cleanup ineffective and risks cross-test pollution. Prefer stubbing sessionDataManager.getStore() to a per-test SqliteStore(testStorageDir) instance (initialized in the test), or refactor SessionDataManager.initialize() to build the store from config.
      // Mock config BEFORE initializing so the store uses the test directory
      vi.spyOn(sessionDataManager, 'getConfig').mockReturnValue({
        storageLocation: testStorageDir,
        autoTrackSessions: true,
        retentionDays: 90,

data-douser added a commit that referenced this pull request Mar 29, 2026
- annotation_list/annotation_search limit: z.number().int().positive()
- annotation_list offset: z.number().int().nonnegative()
- audit_list_findings limit: z.number().int().positive().max(1000)
- query_results_cache_lookup: add limit param (default 50, max 500)
- listAnnotations: wrap FTS MATCH in try/catch; return [] on syntax error
- listAnnotations: emit LIMIT -1 when only offset is provided (SQLite requirement)
- listCacheEntries: add limit filter support
- SessionDataManager.initialize(): recreate SqliteStore when storageLocation
  changed since construction (fixes test isolation and runtime config updates)
- sqlite-store tests: 3 new cases for FTS error, offset-only, cache limit
Copilot AI review requested due to automatic review settings March 29, 2026 20:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 72 out of 76 changed files in this pull request and generated 5 comments.

data-douser and others added 12 commits March 29, 2026 23:57
Replace lowdb with sql.js (asm.js build) for zero-dependency SQLite persistence.
Bundle inline with esbuild — no native modules, no external deps at runtime.

SqliteStore provides three tables:
- sessions: session tracking (migrated from lowdb)
- annotations: key-value annotation store with categories and metadata
- query_result_cache: BQRS/SARIF result caching with subset retrieval

New tools (gated by ENABLE_ANNOTATION_TOOLS env var):
- annotation_create, annotation_list, annotation_search, annotation_delete
- audit_store_findings, audit_list_findings, audit_add_notes, audit_clear_repo
- query_results_cache_lookup, query_results_cache_retrieve,
  query_results_cache_clear, query_results_cache_compare

Code refactoring for maintainability:
- Extract database-resolver.ts from cli-tool-registry.ts
- Extract query-resolver.ts from cli-tool-registry.ts
- Extract result-processor.ts from cli-tool-registry.ts
- Extract codeql-version.ts from cli-executor.ts

Bug fixes:
- Fix params.output not propagated to proce- Fix params.output not propagated to proce- Fix params.output not propagated txternal predicate conditions for direct query paths

Closes #165
- Fix TOCTOU in query-results-evaluator (openSync/fstatSync/readFileSync(fd))
- Use datetime('now') consistently for annotation timestamps
- Debounce flush() with 200ms coalescing via scheduleFlush()
- Fix resultIndices to inclusive [start, end] range with clamping
- Fix WASM→asm.js comments in session-data-manager
- Fix audit-tools header comment (no separate ENABLE_AUDIT_TOOLS flag)
- Add separate maxResults parameter for SARIF in cache-tools
- Use createProjectTempDir() in all test files
- Fix monitoring-tools test init order (mock before initialize)
- Add store.close() in session-data-manager test afterEach
- Bound metadataCache to 256 entries with oldest-first eviction
- Make SqliteStore.initialize() idempotent (close existing db first)
- Fix TOCTOU in initialize(): try readFileSync directly instead of existsSync
- Atomic flush: write to temp file + renameSync to prevent corruption
- Clarify annotation_search uses substring LIKE matching, not FTS
- Close store in monitoring-tools test afterEach to prevent timer leaks
…fallback, JSON.parse fallback

Agent-Logs-Url: https://github.com/advanced-security/codeql-development-mcp-server/sessions/7ff5a5d7-9ac1-4964-8b54-e233d7253301

Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com>
…ics; always apply SARIF path for SARIF format

Agent-Logs-Url: https://github.com/advanced-security/codeql-development-mcp-server/sessions/219712ee-4c28-4b51-9da5-961020112e6e

Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com>
- annotation_list/annotation_search limit: z.number().int().positive()
- annotation_list offset: z.number().int().nonnegative()
- audit_list_findings limit: z.number().int().positive().max(1000)
- query_results_cache_lookup: add limit param (default 50, max 500)
- listAnnotations: wrap FTS MATCH in try/catch; return [] on syntax error
- listAnnotations: emit LIMIT -1 when only offset is provided (SQLite requirement)
- listCacheEntries: add limit filter support
- SessionDataManager.initialize(): recreate SqliteStore when storageLocation
  changed since construction (fixes test isolation and runtime config updates)
- sqlite-store tests: 3 new cases for FTS error, offset-only, cache limit
…ools (#170)

* feat: SqliteStore backend with annotation, audit, and cache tools

Replace lowdb with sql.js (asm.js build) for zero-dependency SQLite persistence.
Bundle inline with esbuild — no native modules, no external deps at runtime.

SqliteStore provides three tables:
- sessions: session tracking (migrated from lowdb)
- annotations: key-value annotation store with categories and metadata
- query_result_cache: BQRS/SARIF result caching with subset retrieval

New tools (gated by ENABLE_ANNOTATION_TOOLS env var):
- annotation_create, annotation_list, annotation_search, annotation_delete
- audit_store_findings, audit_list_findings, audit_add_notes, audit_clear_repo
- query_results_cache_lookup, query_results_cache_retrieve,
  query_results_cache_clear, query_results_cache_compare

Code refactoring for maintainability:
- Extract database-resolver.ts from cli-tool-registry.ts
- Extract query-resolver.ts from cli-tool-registry.ts
- Extract result-processor.ts from cli-tool-registry.ts
- Extract codeql-version.ts from cli-executor.ts

Bug fixes:
- Fix params.output not propagated to proce- Fix params.output not propagated to proce- Fix params.output not propagated txternal predicate conditions for direct query paths

Closes #165

* Add integration tests for annotation, audit, cache, and CallGraphFromTo tools

Client integration test fixtures:
- annotation_create, annotation_delete, annotation_list, annotation_search
- audit_store_findings, audit_list_findings, audit_add_notes, audit_clear_repo
- query_results_cache_lookup, query_results_cache_retrieve,
  query_results_cache_clear, query_results_cache_compare
- codeql_query_run CallGraphFromTo for cpp, javascript, python

Workflow integration test:
- mrva_finding_triage end-to-end workflow

Extension integration tests:
- mcp-tool-e2e: tool availability and MRVA workflow validation

Updated client/scripts/run-integration-tests.sh with annotation mode support.

Closes #166

* Fix server build

* Remove grep from cache tools; fix annotation_search API to FTS semantics; always apply SARIF path for SARIF format

Agent-Logs-Url: https://github.com/advanced-security/codeql-development-mcp-server/sessions/219712ee-4c28-4b51-9da5-961020112e6e

Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com>

* Sync server src/test/dist to dd/sqlite-annotation-cache tip

Brings in all review feedback fixes (FTS safety, offset-only LIMIT,
cache limit param, int/positive Zod schemas, store lifecycle fix)
and the rebuilt dist.

* Fix extension integration tests

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 30, 2026 06:00
@data-douser data-douser force-pushed the dd/sqlite-annotation-cache branch from 072f5ee to 2f9969a Compare March 30, 2026 06:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 72 out of 76 changed files in this pull request and generated no new comments.

@data-douser data-douser merged commit bac94dd into main Mar 30, 2026
22 checks passed
@data-douser data-douser deleted the dd/sqlite-annotation-cache branch March 30, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request server testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SqliteStore backend + annotation, audit, and query result cache tools

3 participants