fix(evm): cache fallback decisions and narrow JIT suitability thresholds#357
fix(evm): cache fallback decisions and narrow JIT suitability thresholds#357starwarfan wants to merge 4 commits intoDTVMStack:mainfrom
Conversation
3b07de5 to
541f825
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves the EVM JIT precompile fallback mechanism by caching per-bytecode fallback decisions in the EVMC VM and refining the analyzer’s fallback criteria to reduce false positives while preserving protection against pathological compilation cases.
Changes:
- Add a per-module
FallbackCache(CRC32 + revision key) to avoid re-runningEVMAnalyzeron repeated executions. - Adjust JIT suitability thresholds by removing MIR-estimate-based fallback and increasing the bytecode-size threshold; keep pattern-based RA-expensive heuristics.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/vm/dt_evmc_vm.cpp |
Adds FallbackCache and uses it in execute() to avoid repeated analyzer work. |
src/compiler/evm_frontend/evm_analyzer.h |
Updates bytecode-size threshold and simplifies fallback verdict to pattern-based checks plus a size guard. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #ifdef ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK | ||
| std::unordered_map<uint64_t, bool> FallbackCache; | ||
| #endif |
There was a problem hiding this comment.
FallbackCache grows monotonically for the lifetime of the VM and is never pruned. In long-running processes that execute many distinct contracts (or untrusted inputs), this can become an unbounded memory growth vector. Consider bounding the cache (e.g., LRU / max entries) or clearing entries alongside module unload/retention policy.
There was a problem hiding this comment.
Good point. In practice the cache is bounded by the number of distinct (CRC32, revision) pairs seen across EVMC execute() calls — for normal EVM usage (one VM per node), the set of deployed contract hashes is finite and modest. For long-running fuzz harnesses that synthesize many unique bytecode blobs this could matter, so adding a max-entries eviction policy is sensible. Will track as a follow-up optimization.
| /// Bytecode size limit: 64 KB is well above the EIP-170 deployed-code limit | ||
| /// (24 576) and the EIP-3860 initcode limit (49 152), so real-world contracts | ||
| /// are never affected, while pathological synthetic tests (400 KB+) are caught. |
There was a problem hiding this comment.
The comment claims only “pathological synthetic tests (400 KB+)” are caught by this bytecode-size guard, but the threshold is 0x10000 (64 KiB), so it will also catch any bytecode larger than 64 KiB. Please adjust the wording to match the actual cutoff (or adjust the cutoff if 400+ KiB was the intent).
| /// Bytecode size limit: 64 KB is well above the EIP-170 deployed-code limit | |
| /// (24 576) and the EIP-3860 initcode limit (49 152), so real-world contracts | |
| /// are never affected, while pathological synthetic tests (400 KB+) are caught. | |
| /// Bytecode size limit: 64 KiB is well above the EIP-170 deployed-code limit | |
| /// (24 576) and the EIP-3860 initcode limit (49 152), so real-world contracts | |
| /// are never affected, while oversized synthetic tests above this threshold | |
| /// (64 KiB+) are caught. |
There was a problem hiding this comment.
You're right — the comment says "400 KB+" but the actual threshold is 0x10000 = 64 KiB. Accepted the suggested wording fix. Will update the comment to accurately reflect the 64 KiB cutoff.
| JITResult.ShouldFallback = | ||
| BytecodeSize > MAX_JIT_BYTECODE_SIZE || | ||
| JITResult.MirEstimate > MAX_JIT_MIR_ESTIMATE || | ||
| JITResult.BytecodeSize > MAX_JIT_BYTECODE_SIZE || | ||
| JITResult.MaxConsecutiveExpensive > MAX_CONSECUTIVE_RA_EXPENSIVE || | ||
| JITResult.MaxBlockExpensiveCount > MAX_BLOCK_RA_EXPENSIVE || |
There was a problem hiding this comment.
The PR description says the bytecode-size guard was removed from the fallback condition, but ShouldFallback still checks JITResult.BytecodeSize > MAX_JIT_BYTECODE_SIZE. Either update the PR description to reflect the retained size guard, or remove the size guard here if that’s the intended behavior change.
There was a problem hiding this comment.
Good catch. The bytecode size guard is intentionally retained — it was the MIR estimate check that was removed from the fallback condition (since the pattern-based thresholds are more targeted). Will clarify the PR description to accurately reflect that only the MIR-estimate threshold was dropped while the bytecode-size guard remains.
| /// (24 576) and the EIP-3860 initcode limit (49 152), so real-world contracts | ||
| /// are never affected, while pathological synthetic tests (400 KB+) are caught. | ||
| static constexpr size_t MAX_JIT_BYTECODE_SIZE = 0x10000; | ||
| static constexpr size_t MAX_JIT_MIR_ESTIMATE = 50000; |
There was a problem hiding this comment.
MAX_JIT_MIR_ESTIMATE remains defined (and MirEstimate is still computed) but it no longer influences ShouldFallback. If MIR estimate is still meant for diagnostics, consider documenting that it is informational-only; otherwise remove the unused threshold to avoid confusion and dead configuration.
| static constexpr size_t MAX_JIT_MIR_ESTIMATE = 50000; |
There was a problem hiding this comment.
Correct — MAX_JIT_MIR_ESTIMATE is no longer used in the ShouldFallback decision. The MirEstimate field is still computed and available in JITSuitabilityResult for diagnostic logging, but the threshold constant itself is dead. Will either remove the constant and add a comment marking MirEstimate as informational, or drop both if no callers inspect MirEstimate.
| // Compute final fallback verdict. | ||
| // Pattern-based thresholds target pathological LLVM register-allocator | ||
| // cases (dense RA-expensive opcodes, DUP feedback loops). | ||
| // The bytecode size guard (64 KB) catches extreme synthetic tests whose | ||
| // sheer IR volume stalls LLVM, without affecting real contracts (bounded | ||
| // by EIP-170 / EIP-3860). | ||
| JITResult.ShouldFallback = | ||
| BytecodeSize > MAX_JIT_BYTECODE_SIZE || | ||
| JITResult.MirEstimate > MAX_JIT_MIR_ESTIMATE || | ||
| JITResult.BytecodeSize > MAX_JIT_BYTECODE_SIZE || | ||
| JITResult.MaxConsecutiveExpensive > MAX_CONSECUTIVE_RA_EXPENSIVE || | ||
| JITResult.MaxBlockExpensiveCount > MAX_BLOCK_RA_EXPENSIVE || | ||
| JITResult.DupFeedbackPatternCount > MAX_DUP_FEEDBACK_PATTERN; |
There was a problem hiding this comment.
The fallback decision logic changed materially (removed MIR-estimate check; adjusted bytecode-size threshold). There are existing GTest suites under src/tests, but none appear to cover EVMAnalyzer suitability decisions. Adding unit tests for representative bytecode patterns (e.g., high RA-expensive density vs. benign PUSH-heavy code) would prevent regressions in these thresholds.
There was a problem hiding this comment.
Agreed that dedicated unit tests for EVMAnalyzer thresholds would be valuable. The existing evmFallbackExecutionTests cover end-to-end fallback behavior but don't test the suitability heuristic boundaries. Will add targeted tests with crafted bytecode patterns (high RA-expensive density, DUP feedback loops, benign PUSH-heavy code) as a follow-up.
src/vm/dt_evmc_vm.cpp
Outdated
| auto CacheIt = VM->FallbackCache.find(ModKey); | ||
| bool NeedFallback; | ||
| if (CacheIt != VM->FallbackCache.end()) { | ||
| NeedFallback = CacheIt->second; | ||
| } else { | ||
| COMPILER::EVMAnalyzer Analyzer(Rev); | ||
| Analyzer.analyze(Code, CodeSize); | ||
| NeedFallback = Analyzer.getJITSuitability().ShouldFallback; | ||
| VM->FallbackCache[ModKey] = NeedFallback; |
There was a problem hiding this comment.
On cache miss, this does a find() and then inserts via FallbackCache[ModKey] = NeedFallback, which performs a second hash lookup and may allocate a default entry first. Since this code is on a hot path for first-seen contracts, consider using emplace/try_emplace with the existing lookup result to avoid the extra work.
| auto CacheIt = VM->FallbackCache.find(ModKey); | |
| bool NeedFallback; | |
| if (CacheIt != VM->FallbackCache.end()) { | |
| NeedFallback = CacheIt->second; | |
| } else { | |
| COMPILER::EVMAnalyzer Analyzer(Rev); | |
| Analyzer.analyze(Code, CodeSize); | |
| NeedFallback = Analyzer.getJITSuitability().ShouldFallback; | |
| VM->FallbackCache[ModKey] = NeedFallback; | |
| auto [CacheIt, Inserted] = VM->FallbackCache.try_emplace(ModKey, false); | |
| bool &NeedFallback = CacheIt->second; | |
| if (Inserted) { | |
| COMPILER::EVMAnalyzer Analyzer(Rev); | |
| Analyzer.analyze(Code, CodeSize); | |
| NeedFallback = Analyzer.getJITSuitability().ShouldFallback; |
There was a problem hiding this comment.
Good optimization — the current find() + operator[] does two hash lookups. Accepted the try_emplace suggestion. Will update to use try_emplace to avoid the redundant lookup and default-construction.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 20%)
Summary: 194 benchmarks, 1 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 20%)
Summary: 194 benchmarks, 0 regressions |
Two fixes in the ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK mechanism:
1. Per-call analyzer overhead: EVMAnalyzer was constructed on every
execute() call, rebuilding a std::map over the full bytecode each
time. For benchmarks running thousands of iterations this added
~245us per call even when no fallback was needed. Fix: cache the
ShouldFallback boolean per module key (CRC32 + revision) in a new
FallbackCache map on the DTVM VM struct.
2. Tuned fallback thresholds to eliminate false positives while
covering all pathological cases:
- Removed MirEstimate condition (caused false positives on PUSH22-32
and other harmless synth benchmarks).
- Raised BytecodeSize limit from 0x6000 (24 576) to 0x10000 (64 KB).
The old limit overlapped the EIP-170 deployed-code ceiling and
caused false positives on real contracts (snailtracer, MUL/b1, etc).
The new limit is safely above both EIP-170 (24 576) and the EIP-3860
initcode limit (49 152), while still catching extreme synthetic test
bytecode (402 KB, 6 MB) that would stall LLVM.
- Pattern-based thresholds (MaxConsecutiveExpensive > 128,
MaxBlockExpensiveCount > 256, DupFeedbackPatternCount > 64) remain
and precisely target the 5 pathological benchmark cases.
Verified: all 223 evmonetestsuite multipass tests pass; large synthetic
bytecode cases (evmone_block_gas_cost_overflow_*) correctly fall back to
interpreter; 207/216 common benchmark cases remain within 20% of the
no-fallback JIT reference.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix comment: 400 KB+ → 64 KiB to match actual MAX_JIT_BYTECODE_SIZE - Remove unused MAX_JIT_MIR_ESTIMATE constant (MirEstimate kept as informational field for diagnostics) - Use try_emplace instead of find + operator[] to avoid redundant hash lookup on cache miss Made-with: Cursor
Made-with: Cursor
07dc063 to
a0623a1
Compare
- Add size limit to FallbackCache to prevent unbounded growth in long-running processes executing many distinct contracts. - Add unit tests for EVMAnalyzer to verify fallback suitability heuristics. Made-with: Cursor
Two bugs in the ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK mechanism:
Per-call analyzer overhead: EVMAnalyzer was constructed on every execute() call, rebuilding a std::map over the full bytecode each time. For benchmarks running thousands of iterations this added ~245us per call even when no fallback was needed. Fix: cache the ShouldFallback boolean per module key (CRC32 + revision) in a new FallbackCache map on the DTVM VM struct.
Over-conservative fallback thresholds: the BytecodeSize and MirEstimate conditions caused false positives on real contracts (snailtracer regressed 22x) and harmless synth benchmarks (PUSH22-32, MUL/b1). The 5 actually-pathological cases (MUL/b0, SIGNEXTEND/b1, SHL/b1, SHR/b1, SAR/b1) are all caught precisely by MaxConsecutiveExpensive > 128 and MaxBlockExpensiveCount > 256 alone (they each contain ~2000 RA-expensive ops in a single basic block). Remove BytecodeSize and MirEstimate from the fallback condition; keep only pattern-based thresholds.
Result: 221 benchmarks complete with no hangs; 207/216 common cases remain within 20% of the no-fallback JIT reference.
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note