perf(core): Deep dive optimizations for hot path#355
Open
Conversation
- Add internal SQL object cache for string statements - Optimize SQL.copy to bypass initialization - Implement micro-cache in SQLProcessor for repeated queries - Optimize observability idle check - Streamline parameter processing and result construction
- Remove unnecessary dict() copy in _unpack_parse_cache_entry - Remove expression.copy() on parse cache store (only copy on retrieve when needed) - Defer expression.copy() to _apply_ast_transformers when transformers active - Fast type dispatch (type(x) is dict) vs ABC isinstance checks - Remove sorted() for dict keys in structural fingerprinting (use insertion order) - Cache is_idle check in ObservabilityRuntime (lifecycle/observers immutable) - Use frozenset intersection for parameter char detection in validator - Optimize ParameterProfile.styles computation for single-style case Benchmark (10,000 INSERTs): - Before: ~20x slowdown vs raw sqlite3 - After: ~15.5x slowdown (tuple params), ~18.8x (dict params) - Function calls reduced: 1.33M → 1.18M (11% fewer) - isinstance() calls reduced: 280k → 200k (28% fewer)
Add benchmark functions to isolate SQLGlot overhead: - bench_sqlite_sqlglot: Cached SQL (minimal overhead) - bench_sqlite_sqlglot_copy: expression.copy() per call - bench_sqlite_sqlglot_nocache: .sql() regeneration per call These help identify whether overhead comes from SQLGlot parsing/generation vs SQLSpec's own processing. Key findings: - SQLGlot cached parsing adds ~0% overhead - expression.copy() per call: 16x overhead (synthetic) - SQLSpec actual overhead: distributed across pipeline
- Updated type hints to use the new syntax for union types in driver.py, _async.py, and _common.py. - Improved readability by formatting long lines and breaking them into multiple lines in driver.py and _common.py. - Removed unnecessary comments and cleaned up import statements in config.py and typing.py. - Enhanced exception handling in AsyncMigrationCommands to use async input for user confirmation. - Refactored logic in CorrelationExtractor to simplify return statements. - Updated the write_fixture_async function to use AsyncPath for resolving paths asynchronously. - Improved test readability and consistency in test_sync_adapters.py and test_fast_path.py by formatting long lines.
- Create new sqlspec/driver/_query_cache.py module - Move CachedQuery namedtuple and QueryCache class - Rename _QueryCache to QueryCache (now public) - Rename _FAST_PATH_QUERY_CACHE_SIZE to QC_MAX_SIZE - Add clear() and __len__() methods to QueryCache - Update test imports - Remove unused OrderedDict import from _common.py Part of driver-arch-cleanup PRD, Chapter 1: qc-extract
Attribute renames: - _fast_path_binder → _qc_binder - _fast_path_enabled → _qc_enabled - _query_cache → _qc Method renames: - _update_fast_path_flag → _update_qc_flag - _fast_rebind → qc_rebind - _build_fast_statement → qc_build - _try_cached_compiled → qc_lookup - _execute_compiled → qc_execute - _maybe_cache_fast_path → qc_store - _configure_fast_path_binder → _configure_qc_binder Test file renamed: test_fast_path.py → test_query_cache.py Part of driver-arch-cleanup PRD, Chapter 2: qc-rename
…ation Move eligibility checks and preparation logic from qc_lookup into new qc_prepare method in _common.py. This eliminates ~15 lines of duplicated logic between sync and async implementations. Before: qc_lookup in both _common.py and _async.py contained identical eligibility checking, cache lookup, rebinding, and statement building. After: qc_prepare does all preparation work, qc_lookup becomes a thin wrapper that calls qc_prepare then qc_execute. Chapter 3 of driver-arch-cleanup_20260203 PRD.
c058f9d to
3833499
Compare
Move eligibility validation from qc_prepare (hot lookup path) to qc_store (store path, executed once per unique query). Before: qc_prepare had 6 condition checks including needs_static_script_compilation and many-params guard. After: qc_prepare has only 2 essential checks: 1. _qc_enabled flag 2. cache lookup + param count match All detailed validation happens at store time, ensuring only valid queries enter the cache in the first place. Chapter 4 of driver-arch-cleanup_20260203 PRD.
The base class _qc_execute now handles the full fast-path execution: - Removed SqliteDriver.qc_execute (redundant with base class) - Removed AiosqliteDriver.qc_execute (redundant with base class) - Renamed qc_lookup -> _qc_lookup (internal API) - Added unreachable assertion to _qc_execute (all paths return/raise) - Fixed return type cast in execute() fast-path The `is_script`/`is_many` branches were dead code since _qc_store filters them out before caching.
Add comprehensive benchmark tooling originally contributed by euri10 in PR #354, with enhancements for testing query cache effectiveness. Scenarios: - initialization: Connection and table setup overhead - write_heavy: Bulk insert performance (execute_many) - read_heavy: Bulk read with fetchall - repeated_queries: Single-row queries with varying params (tests _qc_*) Compares: raw driver vs sqlspec vs SQLAlchemy Drivers: sqlite (asyncpg requires PostgreSQL server) Usage: uv run python scripts/bench.py --driver sqlite --rows 10000 Co-authored-by: euri10 <benoit.barthelet@gmail.com>
- Remove SQLSPEC_RS_INSTALLED flag and get_sqlspec_rs() from _typing.py - Remove _configure_qc_binder() method and calls from config.py - Remove _qc_binder attribute and fast_path_binder handling from driver - Simplify qc_rebind() to use Python-only parameter binding - Fix anyio.to_thread.run_sync pyright errors in migrations - Fix _fast_path_enabled -> _qc_enabled rename in tests - Remove test_cached_compiled_binder_override test (tested removed feature) The query cache (_qc_*) optimizations remain fully functional - only the speculative Rust binder hook was removed until sqlspec_rs is ready.
Add aiosqlite scenarios to benchmark script: - initialization, write_heavy, read_heavy - iterative_inserts, repeated_queries - raw aiosqlite, sqlspec, and sqlalchemy variants Note: Revealed a bug in sqlspec aiosqlite pool - connections are not properly isolated between different database paths. See issue tracking.
- Fix "table already exists" errors by ensuring pools are closed before temp files are deleted - Add leak detection helper `_check_pool_leak()` to detect connection leaks in benchmarks - Use `delete=False` with NamedTemporaryFile and manually unlink after pool.close_pool() to ensure proper cleanup order - Add DROP_TEST_TABLE to all aiosqlite scenarios for consistency Closes #360
- Add fast path for recently-used connections (skip full health check) - Inline mark_as_in_use/mark_as_idle to reduce method call overhead - Skip asyncio.wait_for wrapper on acquire when connection is available - Skip timeout wrapper on release rollback (SQLite rollback is fast) - Check pool capacity without lock first before acquiring lock - Check closed state directly instead of through property Also add --pool-size parameter to benchmark CLI for testing different pool configurations. Results (repeated_queries with 1000 rows): - Before: 95.7% slower than raw - After: 43.9% slower than raw (2.2x improvement)
- Add raw, sqlspec, and sqlalchemy duckdb scenarios for all 5 benchmarks - Fix temp file handling for duckdb (needs to create file itself) - Add duckdb_engine lazy import for sqlalchemy compatibility - Confirms duckdb pool is already efficient (thread-local design) Results show duckdb sqlspec overhead is 3-12% vs raw driver, compared to 20-30% for aiosqlite after optimization. Thread-local pools (sqlite, duckdb) don't need the same hot-path optimization as queue-based pools (aiosqlite).
- Move duckdb-engine from dev to benchmarks group - Add aiosqlite to benchmarks group for async benchmark scenarios - dev group includes benchmarks via include-group
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements deep dive optimizations identified in the core-hotpath-opt flow.
Key Changes
_qc_*): LRU cache for prepared statements - bypasses SQL parsing and parameter transformation on repeated queriesSQLProcessorto bypass dictionary lookups for repeated queriesprepare_statementSQL.copyto fast-track parameter updates and streamlined parameter fingerprintingis_idlecheck to bypass expensive instrumentation overhead when disabledExecutionResultcreation and metadata handlingBenchmark Results (10k rows, sqlite)
How to interpret these results
execute_manyfetchallKey insight: The
repeated_queriesscenario shows the query cache in action. When the same SQL statement is executed repeatedly with different parameters:This reduces sqlspec's overhead from ~1500% (iterative inserts) to just ~2% (repeated queries).
Why iterative_inserts is slow
Each call to
session.execute()must:For bulk operations, use
execute_many()which amortizes this cost across all rows.Benchmark Tooling
Added
scripts/bench.py(originally from @euri10's PR #354) with enhancements:Scenarios:
initialization- Connection and table setup overheadwrite_heavy- Bulk insert viaexecute_manyread_heavy- Bulk insert + fetchalliterative_inserts- Individual execute calls in a looprepeated_queries- Single-row queries with varying params (tests query cache)