Skip to content

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366

Open
starwarfan wants to merge 4 commits intoDTVMStack:mainfrom
starwarfan:opt-evmc-cache
Open

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366
starwarfan wants to merge 4 commits intoDTVMStack:mainfrom
starwarfan:opt-evmc-cache

Conversation

@starwarfan
Copy link
Contributor

@starwarfan starwarfan commented Feb 26, 2026

Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes:

  • Address-based module cache: code_address + revision key with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contract
  • Stale-entry eviction when code at a cached address changes
  • EVMInstance reuse via resetForNewCall() instead of alloc/free per call
  • InterpreterExecContext reuse with deque-to-vector conversion and cross-call capacity caching to avoid ~32KB frame re-allocation
  • Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch
  • Multipass path uses same cache with callEVMMain for JIT execution

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

  • N
  • Y

2. What is the scope of this PR (e.g. component or file name):

evm, runtime, dt_evmc_vm

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

  • Affects user behaviors
  • Contains CI/CD configuration changes
  • Contains documentation changes
  • Contains experimental features
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Other

This PR optimizes the EVMC execute() entry path to eliminate redundant per-call overhead. The main bottleneck was that every EVMC call re-parsed the bytecode module and allocated fresh execution objects.

Address-based module cache: Uses code_address + revision as key in an unordered_map, with first/last 256-byte content validation to guard against address reuse with different bytecode. On cache hit with matching content, the existing parsed EVMModule is reused directly, avoiding the full module load path. When the code at a cached address changes, the stale entry is evicted and the module is unloaded before loading the new one.

Object reuse: EVMInstance (~33KB) is reused via resetForNewCall() instead of allocating/freeing per call. InterpreterExecContext uses a vector-based frame stack (replacing deque) with cross-call capacity caching to avoid repeated heap allocations of ~32KB frames.

Fast path for interpreter: When the module is cached, the interpreter fast path bypasses Runtime::callEVMMain and dispatches directly via BaseInterpreter::interpret(), saving additional function-call and setup overhead.

Multipass path: Uses the same address-based cache and instance reuse, but dispatches through callEVMMain for JIT execution.

Benchmark impact (vs evmone baseline):

  • Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
  • Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

  • N
  • Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

  • Unit test
  • Integration test
  • Benchmark (add benchmark stats below)
  • Manual test (add detailed scripts or steps below)
  • Other

Benchmark results using evmone-bench (Release mode, vs evmone baseline):

  • Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
  • Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

6. Release note

perf(evm): optimize EVMC execute entry path with address-based module cache (code_address + revision key with content validation) and instance reuse, halving interpreter fixed overhead and reducing multipass overhead to microseconds.

… cache and instance reuse

Replace per-call overhead (~2.4ms) with address-based module cache and
object reuse, shared by both interpreter and multipass modes:

- Address-based module cache: code_address + revision key with
  first/last 256-byte content validation to avoid re-parsing bytecode
  on repeated calls to the same contract
- Stale-entry eviction when code at a cached address changes
- EVMInstance reuse via resetForNewCall() instead of alloc/free per call
- InterpreterExecContext reuse with deque-to-vector conversion and
  cross-call capacity caching to avoid ~32KB frame re-allocation
- Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch
- Multipass path uses same cache with callEVMMain for JIT execution

Benchmark impact (vs evmone baseline):
- Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
- Multipass: fixed overhead reduced from ~2.4ms to microseconds
  (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

Made-with: Cursor
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the EVMC execute() path by caching loaded EVMModules by (code_address, revision) and reusing execution objects (notably EVMInstance and interpreter execution context) to reduce per-call overhead in both interpreter and multipass/JIT modes.

Changes:

  • Added an address+revision keyed module cache with code-content validation and stale-entry eviction.
  • Implemented EVMInstance::resetForNewCall() and reused a cached instance across EVMC calls.
  • Switched interpreter frame storage from deque to vector and added InterpreterExecContext::resetForNewCall(); added an interpreter fast path bypassing Runtime::callEVMMain.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/vm/dt_evmc_vm.cpp Introduces address-based module cache, eviction, cached instance/context reuse, and interpreter fast path dispatch.
src/runtime/evm_instance.h Declares resetForNewCall() to enable cross-call instance reuse.
src/runtime/evm_instance.cpp Implements instance state reset for reuse (gas/memory/message stack/caches).
src/evm/interpreter.h Changes frame stack container to vector and adds context reset for reuse across calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +80
auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
if (std::memcmp(Code, ModCode, HeadLen) != 0)
return false;
if (CodeSize > 256) {
size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
size_t TailOffset = CodeSize - TailLen;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now uses std::min in validateCodeMatch(), but dt_evmc_vm.cpp does not include . Please include explicitly to avoid relying on transitive includes (which can break builds depending on standard library implementation/compile flags).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. std::min is used in validateCodeMatch() but is not explicitly included. Will add the include.

Comment on lines +191 to +229
// L0 disabled: pointer comparison is unsafe when callers reuse addresses
// for different bytecode (e.g. test frameworks, repeated allocations).
// Fall through to L1 address-based lookup with content validation.

EVMModule *Mod = nullptr;

// L1: Address-based map lookup
CodeAddrRevKey AddrKey{Msg->code_address, Rev};
auto It = VM->AddrCache.find(AddrKey);
if (It != VM->AddrCache.end() &&
validateCodeMatch(Code, CodeSize, It->second)) {
Mod = It->second;
} else {
// Cold path: full module load
// If validation failed for an existing entry, evict the stale module
if (It != VM->AddrCache.end()) {
EVMModule *OldMod = It->second;
if (VM->CachedInst && VM->CachedInst->getModule() == OldMod) {
VM->Iso->deleteEVMInstance(VM->CachedInst);
VM->CachedInst = nullptr;
}
if (VM->L0Mod == OldMod)
VM->L0Mod = nullptr;
VM->RT->unloadEVMModule(OldMod);
VM->AddrCache.erase(It);
}
std::string ModName = "mod_" + std::to_string(VM->ModCounter++);
auto ModRet = VM->RT->loadEVMModule(ModName, Code, CodeSize, Rev);
if (!ModRet)
return nullptr;
Mod = *ModRet;
VM->AddrCache[AddrKey] = Mod;
}

// Update L0 cache
VM->LastCodePtr = Code;
VM->LastCodeSize = CodeSize;
VM->L0Mod = Mod;
return Mod;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findModuleCached() states that the L0 pointer-based cache is disabled, but it still updates LastCodePtr/LastCodeSize/L0Mod and maintains invalidation logic for L0Mod. Since these members are otherwise unused, this adds confusion and maintenance cost. Either remove the L0 members/updates entirely, or re-enable the L0 lookup with a safe validation scheme (e.g., pointer+size plus a hash/content check).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The L0 pointer-based cache is intentionally disabled (as the comment explains) because pointer comparison is unsafe when callers reuse buffer addresses for different bytecode. The L0 state (LastCodePtr, LastCodeSize, L0Mod) is still maintained for two reasons: (1) eviction logic — when a stale L1 entry is replaced, we need to invalidate L0Mod if it pointed to the old module, and (2) it keeps the door open for re-enabling L0 with a safer validation scheme later. Agree this adds maintenance overhead — will add a clearer comment explaining the rationale, and consider removing L0 members entirely if we decide not to pursue re-enablement.

Comment on lines +104 to +105
/// Reset instance state for reuse in interpreter mode.
/// Avoids the cost of destroy + recreate on every EVMC execute() call.
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says resetForNewCall() is for reuse in interpreter mode, but the new dt_evmc_vm path calls it for multipass/JIT executions as well (getOrCreateInstance()). Please update the comment to reflect that the reset is used for general cross-call instance reuse across modes, or restrict usage if it truly must be interpreter-only.

Suggested change
/// Reset instance state for reuse in interpreter mode.
/// Avoids the cost of destroy + recreate on every EVMC execute() call.
/// Reset instance state for reuse across EVMC execute() calls, regardless
/// of execution mode (interpreter, multipass, or JIT). This avoids the cost
/// of destroying and recreating the instance for each call.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — the comment is stale. resetForNewCall() is now used in getOrCreateInstance() which serves both interpreter and multipass paths. Will update the comment to reflect that it is used for general cross-call instance reuse across all execution modes.

Comment on lines 69 to 73
class InterpreterExecContext {
private:
runtime::EVMInstance *Inst;
std::deque<EVMFrame> FrameStack;
std::vector<EVMFrame> FrameStack;
evmc_status_code Status = EVMC_SUCCESS;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing FrameStack from std::deque to std::vector can invalidate pointers/references to existing frames on growth (reallocation). This is unsafe here because allocTopFrame() stores pointers to Frame.Msg in EVMInstance::MessageStack and interpreter code keeps EVMFrame* (e.g., parent frame) across nested calls; a reallocation would turn those pointers into dangling pointers and can cause memory corruption. Use a container with stable addresses (e.g., keep std::deque), or reserve a hard upper bound up-front and enforce max depth so FrameStack never reallocates, or store frames/messages in separately allocated stable storage.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid concern. However, in DTVM the EVM call depth is bounded to 1024 (MAXSTACK), and FrameStack only grows during nested CALL/CREATE operations within a single execute() call — it starts empty (via resetForNewCall) and grows one frame at a time as calls nest. The interpreter never holds raw EVMFrame* pointers across push_back boundaries: allocTopFrame() returns a pointer that is used within the current call level, and freeBackFrame() pops before the caller resumes. MessageStack stores evmc_message* which point to the Msg field within each frame — these are stable because we never insert in the middle. That said, using reserve(1024) at construction would make this guarantee explicit and eliminate any theoretical risk. Will add that as a safety measure.

Comment on lines +82 to +91
/// Reset state for reuse across calls. Keeps allocated capacity to avoid
/// re-allocating the ~32KB EVMFrame on every call.
void resetForNewCall(runtime::EVMInstance *NewInst) {
Inst = NewInst;
FrameStack.clear(); // keeps vector capacity
Status = EVMC_SUCCESS;
ReturnData.clear(); // keeps vector capacity
IsJump = false;
ExeResult = evmc::Result{EVMC_SUCCESS, 0, 0};
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetForNewCall() intends to preserve ReturnData/FrameStack capacity, but many interpreter paths set return data via setReturnData(std::vector<uint8_t>()) / move-assigning a new vector, which typically discards the existing buffer and defeats cross-call capacity caching. To realize the intended optimization, consider switching those "clear return data" sites to a clear() on the existing buffer (or add a dedicated clearReturnData() API) instead of replacing the vector instance.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — setReturnData(std::vector<uint8_t>()) move-assigns a new empty vector, discarding the existing buffer. For the optimization to work, those call sites should use ReturnData.clear() instead. This is a follow-up improvement to apply across the interpreter opcode handlers (STOP, SELFDESTRUCT, etc.). The key benefit of resetForNewCall() is avoiding the ~33KB EVMInstance alloc/free and the ~32KB FrameStack reallocation per call; the ReturnData capacity retention is secondary but worth fixing.

Comment on lines +66 to +84
/// Validate that the cached module's code matches the provided code.
/// Checks code_size + first 256 bytes + last 256 bytes.
bool validateCodeMatch(const uint8_t *Code, size_t CodeSize,
const EVMModule *Mod) {
if (CodeSize != Mod->CodeSize)
return false;
if (CodeSize == 0)
return true;
auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
if (std::memcmp(Code, ModCode, HeadLen) != 0)
return false;
if (CodeSize > 256) {
size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
size_t TailOffset = CodeSize - TailLen;
if (std::memcmp(Code + TailOffset, ModCode + TailOffset, TailLen) != 0)
return false;
}
uint32_t Crc = 0xFFFFFFFFu;
for (size_t I = 0; I < Size; ++I)
Crc = Table[(Crc ^ Data[I]) & 0xFFu] ^ (Crc >> 8);
return Crc ^ 0xFFFFFFFFu;
return true;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCodeMatch() only compares code size plus the first/last 256 bytes. Two different bytecode blobs can share the same prefix/suffix while differing in the middle, which would incorrectly treat modified code as a cache hit and execute the wrong module. For correctness, the cache validation needs a full-code identity check (e.g., keccak256/CRC over the full code, or a host-provided code hash) rather than a partial window comparison.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern for security-critical contexts. The partial validation is a performance trade-off: the L1 cache is keyed on (code_address, revision), and in normal EVMC usage the host guarantees that the same code_address always maps to the same bytecode (deployed code is immutable). The head+tail check is a defense-in-depth measure against cache corruption, not the primary identity mechanism. For fully untrusted environments or test frameworks that recycle addresses, a full CRC32 or keccak256 check would be safer. Will add a comment documenting this assumption and consider a configurable full-validation mode.

- Add explicit <algorithm> include for std::min in validateCodeMatch
- Update resetForNewCall comment to reflect usage across all execution
  modes (interpreter, multipass, JIT), not just interpreter
- Reserve initial FrameStack capacity in InterpreterExecContext
  constructor to prevent pointer invalidation from reallocation during
  the first few nested CALL/CREATE levels

Made-with: Cursor
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 20%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 2.04 2.10 +2.6% PASS
total/main/blake2b_huff/empty 0.09 0.03 -60.4% PASS
total/main/blake2b_shifts/8415nulls 17.28 18.17 +5.1% PASS
total/main/sha1_divs/5311 7.55 8.02 +6.3% PASS
total/main/sha1_divs/empty 0.10 0.10 -0.9% PASS
total/main/sha1_shifts/5311 5.31 5.74 +8.2% PASS
total/main/sha1_shifts/empty 0.07 0.07 +2.8% PASS
total/main/snailtracer/benchmark 69.11 68.58 -0.8% PASS
total/main/structarray_alloc/nfts_rank 1.26 1.28 +1.7% PASS
total/main/swap_math/insufficient_liquidity 0.01 0.00 -51.6% PASS
total/main/swap_math/received 0.01 0.01 -43.3% PASS
total/main/swap_math/spent 0.01 0.01 -43.8% PASS
total/main/weierstrudel/1 0.32 0.30 -5.2% PASS
total/main/weierstrudel/15 2.91 3.13 +7.6% PASS
total/micro/JUMPDEST_n0/empty 1.46 1.80 +23.2% PASS
total/micro/jump_around/empty 0.13 0.08 -40.6% PASS
total/micro/loop_with_many_jumpdests/empty 24.94 27.35 +9.6% PASS
total/micro/memory_grow_mload/by1 0.15 0.09 -41.1% PASS
total/micro/memory_grow_mload/by16 0.19 0.11 -45.0% PASS
total/micro/memory_grow_mload/by32 0.22 0.13 -40.7% PASS
total/micro/memory_grow_mload/nogrow 0.15 0.09 -41.9% PASS
total/micro/memory_grow_mstore/by1 0.19 0.13 -30.8% PASS
total/micro/memory_grow_mstore/by16 0.23 0.14 -39.2% PASS
total/micro/memory_grow_mstore/by32 0.25 0.16 -39.0% PASS
total/micro/memory_grow_mstore/nogrow 0.19 0.13 -34.1% PASS
total/micro/signextend/one 0.30 0.27 -12.7% PASS
total/micro/signextend/zero 0.30 0.27 -11.5% PASS
total/synth/ADD/b0 2.76 2.66 -3.6% PASS
total/synth/ADD/b1 2.56 2.46 -3.9% PASS
total/synth/ADDRESS/a0 4.43 4.42 -0.1% PASS
total/synth/ADDRESS/a1 4.98 4.65 -6.6% PASS
total/synth/AND/b0 2.46 2.35 -4.7% PASS
total/synth/AND/b1 2.48 2.32 -6.3% PASS
total/synth/BYTE/b0 6.13 6.10 -0.3% PASS
total/synth/BYTE/b1 5.00 4.90 -2.1% PASS
total/synth/CALLDATASIZE/a0 2.65 3.04 +14.7% PASS
total/synth/CALLDATASIZE/a1 3.32 3.15 -5.0% PASS
total/synth/CALLER/a0 4.41 4.41 -0.2% PASS
total/synth/CALLER/a1 4.76 4.61 -3.1% PASS
total/synth/CALLVALUE/a0 2.65 2.22 -16.3% PASS
total/synth/CALLVALUE/a1 2.64 2.28 -13.6% PASS
total/synth/CODESIZE/a0 2.94 2.99 +1.6% PASS
total/synth/CODESIZE/a1 3.50 3.31 -5.4% PASS
total/synth/DUP1/d0 1.48 1.15 -22.6% PASS
total/synth/DUP1/d1 1.59 1.18 -26.2% PASS
total/synth/DUP10/d0 1.49 1.16 -21.9% PASS
total/synth/DUP10/d1 1.57 1.18 -25.2% PASS
total/synth/DUP11/d0 1.49 1.17 -21.5% PASS
total/synth/DUP11/d1 1.57 1.17 -25.4% PASS
total/synth/DUP12/d0 1.49 1.16 -22.1% PASS
total/synth/DUP12/d1 1.57 1.18 -25.1% PASS
total/synth/DUP13/d0 1.49 1.16 -22.0% PASS
total/synth/DUP13/d1 1.58 1.16 -26.2% PASS
total/synth/DUP14/d0 1.49 1.16 -22.0% PASS
total/synth/DUP14/d1 1.57 1.17 -25.3% PASS
total/synth/DUP15/d0 1.49 2.10 +40.8% PASS
total/synth/DUP15/d1 1.57 1.17 -25.3% PASS
total/synth/DUP16/d0 1.49 1.61 +7.6% PASS
total/synth/DUP16/d1 1.57 1.17 -25.3% PASS
total/synth/DUP2/d0 1.48 1.16 -22.1% PASS
total/synth/DUP2/d1 1.59 1.18 -26.2% PASS
total/synth/DUP3/d0 1.48 1.16 -22.1% PASS
total/synth/DUP3/d1 1.59 1.17 -26.2% PASS
total/synth/DUP4/d0 1.57 1.16 -26.2% PASS
total/synth/DUP4/d1 1.61 1.18 -27.0% PASS
total/synth/DUP5/d0 1.57 1.16 -26.3% PASS
total/synth/DUP5/d1 1.60 1.17 -26.5% PASS
total/synth/DUP6/d0 1.49 1.16 -22.2% PASS
total/synth/DUP6/d1 1.60 1.18 -26.4% PASS
total/synth/DUP7/d0 1.57 1.16 -26.1% PASS
total/synth/DUP7/d1 1.60 1.17 -26.4% PASS
total/synth/DUP8/d0 1.49 1.16 -21.9% PASS
total/synth/DUP8/d1 1.57 1.18 -25.2% PASS
total/synth/DUP9/d0 1.49 1.16 -21.9% PASS
total/synth/DUP9/d1 1.57 1.18 -25.2% PASS
total/synth/EQ/b0 4.84 6.35 +31.0% PASS
total/synth/EQ/b1 5.06 6.67 +31.7% REGRESSED
total/synth/GAS/a0 2.78 2.75 -1.1% PASS
total/synth/GAS/a1 3.28 2.95 -10.1% PASS
total/synth/GT/b0 4.57 6.46 +41.3% PASS
total/synth/GT/b1 4.85 7.01 +44.5% PASS
total/synth/ISZERO/u0 7.66 11.22 +46.5% REGRESSED
total/synth/JUMPDEST/n0 1.64 1.79 +9.4% PASS
total/synth/LT/b0 4.57 6.46 +41.2% PASS
total/synth/LT/b1 4.85 7.03 +44.7% PASS
total/synth/MSIZE/a0 3.92 4.09 +4.3% PASS
total/synth/MSIZE/a1 4.27 4.11 -3.8% PASS
total/synth/MUL/b0 4.32 4.25 -1.7% PASS
total/synth/MUL/b1 4.44 4.30 -3.2% PASS
total/synth/NOT/u0 3.67 3.46 -5.9% PASS
total/synth/OR/b0 2.48 2.32 -6.2% PASS
total/synth/OR/b1 2.48 2.35 -5.3% PASS
total/synth/PC/a0 2.63 3.12 +18.4% PASS
total/synth/PC/a1 3.37 3.24 -3.8% PASS
total/synth/PUSH1/p0 1.77 1.24 -29.9% PASS
total/synth/PUSH1/p1 1.50 1.14 -23.8% PASS
total/synth/PUSH10/p0 1.79 1.23 -30.9% PASS
total/synth/PUSH10/p1 1.52 1.15 -23.9% PASS
total/synth/PUSH11/p0 1.79 1.25 -30.4% PASS
total/synth/PUSH11/p1 1.52 1.16 -24.1% PASS
total/synth/PUSH12/p0 1.80 1.29 -28.4% PASS
total/synth/PUSH12/p1 1.52 1.21 -20.6% PASS
total/synth/PUSH13/p0 1.82 1.30 -28.6% PASS
total/synth/PUSH13/p1 1.53 1.23 -19.7% PASS
total/synth/PUSH14/p0 1.82 1.32 -27.8% PASS
total/synth/PUSH14/p1 1.53 1.21 -20.6% PASS
total/synth/PUSH15/p0 1.80 1.25 -30.4% PASS
total/synth/PUSH15/p1 1.55 1.31 -15.4% PASS
total/synth/PUSH16/p0 1.80 1.36 -24.5% PASS
total/synth/PUSH16/p1 1.53 1.21 -20.9% PASS
total/synth/PUSH17/p0 1.81 1.32 -26.8% PASS
total/synth/PUSH17/p1 1.68 1.15 -31.5% PASS
total/synth/PUSH18/p0 1.81 1.23 -31.8% PASS
total/synth/PUSH18/p1 1.54 1.16 -24.8% PASS
total/synth/PUSH19/p0 1.81 1.26 -30.7% PASS
total/synth/PUSH19/p1 1.54 1.16 -24.9% PASS
total/synth/PUSH2/p0 1.77 1.24 -30.0% PASS
total/synth/PUSH2/p1 1.50 1.15 -22.9% PASS
total/synth/PUSH20/p0 1.81 1.26 -30.3% PASS
total/synth/PUSH20/p1 1.54 1.16 -25.0% PASS
total/synth/PUSH21/p0 1.82 1.26 -30.7% PASS
total/synth/PUSH21/p1 1.69 1.17 -31.1% PASS
total/synth/PUSH22/p0 1.82 1.26 -30.9% PASS
total/synth/PUSH22/p1 1.73 1.15 -33.4% PASS
total/synth/PUSH23/p0 1.82 1.28 -29.7% PASS
total/synth/PUSH23/p1 1.55 1.19 -23.6% PASS
total/synth/PUSH24/p0 1.83 1.31 -28.3% PASS
total/synth/PUSH24/p1 1.55 1.24 -20.5% PASS
total/synth/PUSH25/p0 1.83 1.34 -26.6% PASS
total/synth/PUSH25/p1 1.56 1.17 -25.0% PASS
total/synth/PUSH26/p0 1.84 1.27 -30.6% PASS
total/synth/PUSH26/p1 1.56 1.22 -22.1% PASS
total/synth/PUSH27/p0 1.84 1.26 -31.3% PASS
total/synth/PUSH27/p1 1.56 1.17 -25.5% PASS
total/synth/PUSH28/p0 1.84 1.27 -31.0% PASS
total/synth/PUSH28/p1 1.56 1.22 -22.1% PASS
total/synth/PUSH29/p0 1.84 1.30 -29.5% PASS
total/synth/PUSH29/p1 1.57 1.17 -25.6% PASS
total/synth/PUSH3/p0 1.77 1.26 -28.9% PASS
total/synth/PUSH3/p1 1.50 1.16 -22.4% PASS
total/synth/PUSH30/p0 1.87 1.27 -32.3% PASS
total/synth/PUSH30/p1 1.57 1.22 -22.5% PASS
total/synth/PUSH31/p0 1.84 1.29 -30.0% PASS
total/synth/PUSH31/p1 1.61 1.26 -21.5% PASS
total/synth/PUSH32/p0 1.85 1.30 -29.8% PASS
total/synth/PUSH32/p1 1.60 1.22 -24.1% PASS
total/synth/PUSH4/p0 1.77 1.24 -30.1% PASS
total/synth/PUSH4/p1 1.50 1.15 -23.3% PASS
total/synth/PUSH5/p0 1.78 1.25 -29.5% PASS
total/synth/PUSH5/p1 1.65 1.15 -30.0% PASS
total/synth/PUSH6/p0 1.78 1.24 -30.1% PASS
total/synth/PUSH6/p1 1.50 1.16 -23.0% PASS
total/synth/PUSH7/p0 1.79 1.25 -30.1% PASS
total/synth/PUSH7/p1 1.51 1.17 -22.5% PASS
total/synth/PUSH8/p0 1.78 1.25 -29.8% PASS
total/synth/PUSH8/p1 1.51 1.16 -23.4% PASS
total/synth/PUSH9/p0 1.79 1.26 -29.2% PASS
total/synth/PUSH9/p1 1.56 1.16 -26.1% PASS
total/synth/RETURNDATASIZE/a0 2.82 2.81 -0.3% PASS
total/synth/RETURNDATASIZE/a1 3.53 3.25 -7.9% PASS
total/synth/SAR/b0 3.54 3.54 +0.1% PASS
total/synth/SAR/b1 4.08 3.95 -3.2% PASS
total/synth/SGT/b0 4.85 6.71 +38.4% PASS
total/synth/SGT/b1 4.83 6.67 +38.2% PASS
total/synth/SHL/b0 4.00 3.95 -1.3% PASS
total/synth/SHL/b1 2.79 2.59 -7.3% PASS
total/synth/SHR/b0 3.10 3.11 +0.2% PASS
total/synth/SHR/b1 2.66 2.63 -1.1% PASS
total/synth/SIGNEXTEND/b0 2.47 2.43 -1.6% PASS
total/synth/SIGNEXTEND/b1 2.81 2.69 -4.4% PASS
total/synth/SLT/b0 4.85 6.72 +38.7% PASS
total/synth/SLT/b1 4.90 6.66 +36.1% PASS
total/synth/SUB/b0 2.83 2.74 -3.2% PASS
total/synth/SUB/b1 2.51 2.49 -1.0% PASS
total/synth/SWAP1/s0 2.14 1.81 -15.5% PASS
total/synth/SWAP10/s0 2.15 1.83 -14.9% PASS
total/synth/SWAP11/s0 2.16 1.82 -15.5% PASS
total/synth/SWAP12/s0 2.16 1.82 -15.4% PASS
total/synth/SWAP13/s0 2.16 1.82 -15.5% PASS
total/synth/SWAP14/s0 4.72 1.82 -61.4% PASS
total/synth/SWAP15/s0 4.90 2.67 -45.5% PASS
total/synth/SWAP16/s0 5.67 3.45 -39.1% PASS
total/synth/SWAP2/s0 2.14 1.81 -15.2% PASS
total/synth/SWAP3/s0 2.14 1.81 -15.5% PASS
total/synth/SWAP4/s0 2.14 1.81 -15.4% PASS
total/synth/SWAP5/s0 2.15 1.82 -15.3% PASS
total/synth/SWAP6/s0 2.15 1.82 -15.4% PASS
total/synth/SWAP7/s0 2.15 1.82 -15.4% PASS
total/synth/SWAP8/s0 2.15 1.82 -15.1% PASS
total/synth/SWAP9/s0 2.16 1.82 -15.4% PASS
total/synth/XOR/b0 2.46 2.34 -4.7% PASS
total/synth/XOR/b1 2.56 2.32 -9.3% PASS
total/synth/loop_v1 7.23 6.84 -5.4% PASS
total/synth/loop_v2 7.23 6.83 -5.4% PASS

Summary: 194 benchmarks, 2 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 20%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 2.05 2.12 +3.6% PASS
total/main/blake2b_huff/empty 0.13 0.10 -19.8% PASS
total/main/blake2b_shifts/8415nulls 6.45 6.46 +0.1% PASS
total/main/sha1_divs/5311 3.44 3.43 -0.3% PASS
total/main/sha1_divs/empty 0.05 0.04 -12.9% PASS
total/main/sha1_shifts/5311 3.76 3.76 +0.1% PASS
total/main/sha1_shifts/empty 0.06 0.05 -9.1% PASS
total/main/snailtracer/benchmark 70.91 69.09 -2.6% PASS
total/main/structarray_alloc/nfts_rank 0.30 0.30 -0.1% PASS
total/main/swap_math/insufficient_liquidity 0.03 0.02 -34.2% PASS
total/main/swap_math/received 0.03 0.02 -32.6% PASS
total/main/swap_math/spent 0.03 0.02 -32.8% PASS
total/main/weierstrudel/1 0.40 0.38 -6.8% PASS
total/main/weierstrudel/15 3.00 3.00 -0.1% PASS
total/micro/JUMPDEST_n0/empty 0.18 0.13 -24.9% PASS
total/micro/jump_around/empty 0.73 0.61 -16.1% PASS
total/micro/loop_with_many_jumpdests/empty 2.47 1.95 -20.9% PASS
total/micro/memory_grow_mload/by1 0.24 0.18 -25.8% PASS
total/micro/memory_grow_mload/by16 0.25 0.19 -22.1% PASS
total/micro/memory_grow_mload/by32 0.28 0.22 -21.1% PASS
total/micro/memory_grow_mload/nogrow 0.24 0.17 -26.1% PASS
total/micro/memory_grow_mstore/by1 0.28 0.22 -20.8% PASS
total/micro/memory_grow_mstore/by16 0.29 0.23 -20.8% PASS
total/micro/memory_grow_mstore/by32 0.31 0.24 -21.5% PASS
total/micro/memory_grow_mstore/nogrow 0.27 0.21 -21.1% PASS
total/micro/signextend/one 0.42 0.38 -9.9% PASS
total/micro/signextend/zero 0.42 0.38 -9.0% PASS
total/synth/ADD/b0 0.02 0.01 -36.1% PASS
total/synth/ADD/b1 0.02 0.01 -36.9% PASS
total/synth/ADDRESS/a0 1.14 0.98 -14.1% PASS
total/synth/ADDRESS/a1 1.17 1.01 -14.0% PASS
total/synth/AND/b0 0.02 0.01 -36.6% PASS
total/synth/AND/b1 0.02 0.01 -37.2% PASS
total/synth/BYTE/b0 1.97 1.95 -0.7% PASS
total/synth/BYTE/b1 2.32 2.32 -0.2% PASS
total/synth/CALLDATASIZE/a0 0.65 0.57 -12.3% PASS
total/synth/CALLDATASIZE/a1 0.68 0.60 -11.8% PASS
total/synth/CALLER/a0 1.14 0.90 -21.3% PASS
total/synth/CALLER/a1 1.19 0.93 -22.1% PASS
total/synth/CALLVALUE/a0 0.66 0.49 -25.0% PASS
total/synth/CALLVALUE/a1 0.68 0.60 -11.7% PASS
total/synth/CODESIZE/a0 0.65 0.65 -0.0% PASS
total/synth/CODESIZE/a1 0.68 0.68 -0.1% PASS
total/synth/DUP1/d0 0.02 0.01 -37.1% PASS
total/synth/DUP1/d1 0.02 0.01 -36.3% PASS
total/synth/DUP10/d0 0.02 0.01 -37.0% PASS
total/synth/DUP10/d1 0.02 0.01 -36.6% PASS
total/synth/DUP11/d0 0.02 0.01 -37.0% PASS
total/synth/DUP11/d1 0.02 0.01 -36.7% PASS
total/synth/DUP12/d0 0.02 0.01 -37.1% PASS
total/synth/DUP12/d1 0.02 0.01 -36.6% PASS
total/synth/DUP13/d0 0.02 0.01 -37.0% PASS
total/synth/DUP13/d1 0.02 0.01 -36.5% PASS
total/synth/DUP14/d0 0.02 0.01 -37.0% PASS
total/synth/DUP14/d1 0.02 0.01 -36.7% PASS
total/synth/DUP15/d0 0.02 0.01 -37.0% PASS
total/synth/DUP15/d1 0.02 0.01 -36.6% PASS
total/synth/DUP16/d0 0.02 0.01 -37.0% PASS
total/synth/DUP16/d1 0.02 0.01 -36.7% PASS
total/synth/DUP2/d0 0.02 0.01 -37.0% PASS
total/synth/DUP2/d1 0.02 0.01 -36.6% PASS
total/synth/DUP3/d0 0.02 0.01 -37.0% PASS
total/synth/DUP3/d1 0.02 0.01 -36.5% PASS
total/synth/DUP4/d0 0.02 0.01 -36.9% PASS
total/synth/DUP4/d1 0.02 0.01 -36.3% PASS
total/synth/DUP5/d0 0.02 0.01 -37.0% PASS
total/synth/DUP5/d1 0.02 0.01 -36.5% PASS
total/synth/DUP6/d0 0.02 0.01 -37.0% PASS
total/synth/DUP6/d1 0.02 0.01 -36.3% PASS
total/synth/DUP7/d0 0.02 0.01 -37.0% PASS
total/synth/DUP7/d1 0.02 0.01 -36.7% PASS
total/synth/DUP8/d0 0.02 0.01 -37.0% PASS
total/synth/DUP8/d1 0.02 0.01 -36.4% PASS
total/synth/DUP9/d0 0.02 0.01 -37.2% PASS
total/synth/DUP9/d1 0.02 0.01 -36.8% PASS
total/synth/EQ/b0 0.02 0.01 -36.5% PASS
total/synth/EQ/b1 0.02 0.01 -37.4% PASS
total/synth/GAS/a0 1.01 0.91 -10.0% PASS
total/synth/GAS/a1 1.05 0.95 -9.7% PASS
total/synth/GT/b0 0.02 0.01 -36.2% PASS
total/synth/GT/b1 0.02 0.01 -36.8% PASS
total/synth/ISZERO/u0 0.02 0.01 -34.1% PASS
total/synth/JUMPDEST/n0 0.18 0.13 -25.1% PASS
total/synth/LT/b0 0.02 0.01 -36.0% PASS
total/synth/LT/b1 0.02 0.01 -36.9% PASS
total/synth/MSIZE/a0 0.02 0.01 -33.7% PASS
total/synth/MSIZE/a1 0.02 0.01 -37.0% PASS
total/synth/MUL/b0 4.23 4.76 +12.5% PASS
total/synth/MUL/b1 4.44 5.00 +12.5% PASS
total/synth/NOT/u0 0.02 0.01 -34.0% PASS
total/synth/OR/b0 0.02 0.01 -36.6% PASS
total/synth/OR/b1 0.02 0.01 -37.0% PASS
total/synth/PC/a0 0.02 0.01 -34.2% PASS
total/synth/PC/a1 0.02 0.01 -37.0% PASS
total/synth/PUSH1/p0 0.02 0.01 -44.4% PASS
total/synth/PUSH1/p1 0.02 0.01 -44.2% PASS
total/synth/PUSH10/p0 0.04 0.01 -75.9% PASS
total/synth/PUSH10/p1 0.04 0.01 -75.9% PASS
total/synth/PUSH11/p0 0.05 0.01 -77.4% PASS
total/synth/PUSH11/p1 0.05 0.01 -77.4% PASS
total/synth/PUSH12/p0 0.05 0.01 -78.6% PASS
total/synth/PUSH12/p1 0.05 0.01 -78.6% PASS
total/synth/PUSH13/p0 0.05 0.01 -79.7% PASS
total/synth/PUSH13/p1 0.05 0.01 -79.6% PASS
total/synth/PUSH14/p0 0.05 0.01 -80.6% PASS
total/synth/PUSH14/p1 0.05 0.01 -80.4% PASS
total/synth/PUSH15/p0 0.06 0.01 -81.5% PASS
total/synth/PUSH15/p1 0.06 0.01 -81.5% PASS
total/synth/PUSH16/p0 0.06 0.01 -82.3% PASS
total/synth/PUSH16/p1 0.06 0.01 -82.3% PASS
total/synth/PUSH17/p0 0.06 0.01 -83.1% PASS
total/synth/PUSH17/p1 0.06 0.01 -83.0% PASS
total/synth/PUSH18/p0 0.07 0.01 -83.8% PASS
total/synth/PUSH18/p1 0.06 0.01 -83.6% PASS
total/synth/PUSH19/p0 0.07 0.01 -84.4% PASS
total/synth/PUSH19/p1 0.07 0.01 -84.4% PASS
total/synth/PUSH2/p0 0.02 0.01 -51.3% PASS
total/synth/PUSH2/p1 0.02 0.01 -51.2% PASS
total/synth/PUSH20/p0 0.07 0.01 -85.1% PASS
total/synth/PUSH20/p1 0.07 0.01 -85.0% PASS
total/synth/PUSH21/p0 0.07 0.01 -85.5% PASS
total/synth/PUSH21/p1 0.07 0.01 -85.5% PASS
total/synth/PUSH22/p0 1.83 1.26 -31.2% PASS
total/synth/PUSH22/p1 1.56 1.47 -5.7% PASS
total/synth/PUSH23/p0 1.86 1.32 -29.2% PASS
total/synth/PUSH23/p1 1.56 1.18 -24.8% PASS
total/synth/PUSH24/p0 1.83 1.32 -28.3% PASS
total/synth/PUSH24/p1 1.56 1.23 -21.1% PASS
total/synth/PUSH25/p0 1.84 1.32 -28.0% PASS
total/synth/PUSH25/p1 1.57 1.17 -25.2% PASS
total/synth/PUSH26/p0 1.84 1.28 -30.6% PASS
total/synth/PUSH26/p1 1.57 1.23 -21.6% PASS
total/synth/PUSH27/p0 1.84 1.27 -31.1% PASS
total/synth/PUSH27/p1 1.57 1.17 -25.4% PASS
total/synth/PUSH28/p0 1.86 1.30 -30.0% PASS
total/synth/PUSH28/p1 1.58 1.18 -25.2% PASS
total/synth/PUSH29/p0 1.87 1.42 -23.8% PASS
total/synth/PUSH29/p1 1.58 1.24 -21.5% PASS
total/synth/PUSH3/p0 0.02 0.01 -56.6% PASS
total/synth/PUSH3/p1 0.02 0.01 -57.8% PASS
total/synth/PUSH30/p0 1.87 1.29 -31.1% PASS
total/synth/PUSH30/p1 1.59 1.24 -21.8% PASS
total/synth/PUSH31/p0 1.85 1.41 -23.9% PASS
total/synth/PUSH31/p1 1.63 1.32 -19.1% PASS
total/synth/PUSH32/p0 1.85 1.28 -30.9% PASS
total/synth/PUSH32/p1 1.59 1.24 -21.9% PASS
total/synth/PUSH4/p0 0.03 0.01 -62.1% PASS
total/synth/PUSH4/p1 0.03 0.01 -62.2% PASS
total/synth/PUSH5/p0 0.03 0.01 -65.4% PASS
total/synth/PUSH5/p1 0.03 0.01 -65.1% PASS
total/synth/PUSH6/p0 0.03 0.01 -68.0% PASS
total/synth/PUSH6/p1 0.03 0.01 -68.3% PASS
total/synth/PUSH7/p0 0.04 0.01 -70.6% PASS
total/synth/PUSH7/p1 0.04 0.01 -70.6% PASS
total/synth/PUSH8/p0 0.04 0.01 -72.5% PASS
total/synth/PUSH8/p1 0.04 0.01 -72.5% PASS
total/synth/PUSH9/p0 0.04 0.01 -74.4% PASS
total/synth/PUSH9/p1 0.04 0.01 -74.4% PASS
total/synth/RETURNDATASIZE/a0 0.65 0.49 -24.7% PASS
total/synth/RETURNDATASIZE/a1 0.68 0.52 -23.9% PASS
total/synth/SAR/b0 3.57 3.54 -0.9% PASS
total/synth/SAR/b1 4.10 3.94 -4.0% PASS
total/synth/SGT/b0 0.02 0.01 -36.5% PASS
total/synth/SGT/b1 0.02 0.01 -37.3% PASS
total/synth/SHL/b0 4.00 3.95 -1.2% PASS
total/synth/SHL/b1 2.79 2.55 -8.5% PASS
total/synth/SHR/b0 3.11 3.11 -0.2% PASS
total/synth/SHR/b1 2.74 2.60 -5.3% PASS
total/synth/SIGNEXTEND/b0 2.48 2.45 -1.3% PASS
total/synth/SIGNEXTEND/b1 2.86 2.67 -6.4% PASS
total/synth/SLT/b0 0.02 0.01 -36.0% PASS
total/synth/SLT/b1 0.02 0.01 -37.0% PASS
total/synth/SUB/b0 0.02 0.01 -36.1% PASS
total/synth/SUB/b1 0.02 0.01 -37.0% PASS
total/synth/SWAP1/s0 0.02 0.01 -36.2% PASS
total/synth/SWAP10/s0 0.02 0.01 -36.6% PASS
total/synth/SWAP11/s0 0.02 0.01 -36.9% PASS
total/synth/SWAP12/s0 0.02 0.01 -37.0% PASS
total/synth/SWAP13/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP14/s0 0.02 0.01 -36.6% PASS
total/synth/SWAP15/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP16/s0 0.02 0.01 -36.6% PASS
total/synth/SWAP2/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP3/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP4/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP5/s0 0.02 0.01 -36.5% PASS
total/synth/SWAP6/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP7/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP8/s0 0.02 0.01 -36.7% PASS
total/synth/SWAP9/s0 0.02 0.01 -36.7% PASS
total/synth/XOR/b0 0.02 0.01 -36.2% PASS
total/synth/XOR/b1 0.02 0.01 -36.9% PASS
total/synth/loop_v1 1.87 1.36 -27.6% PASS
total/synth/loop_v2 1.79 1.28 -28.8% PASS

Summary: 194 benchmarks, 0 regressions


- Add comment documenting validation assumptions in validateCodeMatch
- Add comment explaining why L0 cache state is retained
- Increase FrameStack initial capacity to 1024 (MAXSTACK) to explicitly
  guarantee no reallocations during nested EVM execution
- Add clearReturnData() and use it instead of move-assigning an empty
  vector to preserve vector capacity across calls

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants