DTVMStack · starwarfan · Feb 25, 2026 · Mar 2, 2026 · Mar 2, 2026 · Mar 3, 2026
diff --git a/.claude/skills/dtvm-debug-test-failure/SKILL.md b/.claude/skills/dtvm-debug-test-failure/SKILL.md
@@ -0,0 +1,111 @@
+---
+name: dtvm-debug-test-failure
+description: Comprehensive test failure debugging and analysis for DTVM. Use when tests fail in any execution mode (interpreter, singlepass JIT, multipass JIT) to analyze failure patterns, extract error logs, and generate debugging reports, and provide fix if possible. Triggers on test failures, ctest errors, or when investigating why specific test cases fail.
+allowed-tools: Bash, Read, Grep
+---
+
+# Debug Test Failure Skill
+
+This skill helps analyze and debug test failures in DTVM's multi-mode execution environment.
+
+## Quick Start
+
+When a test fails, use this skill to:
+1. Parse test output and identify failure patterns
+2. Address related code and analyze the cause of the failure
+3. Add print in related code or use gdb to verify the cause
+4. Provide fix to test failures
+
+## Failure Analysis Workflow
+
+### 1. Identify Test Failure
+
+Understand which test fails and how to run test. Let user provide a command or a script file to run the test, check the test output and search for failure patterns(such as fail, failure, etc). If there's a failure, read error information for it.
+
+### 2. Address the code and understand the test case
+
+Find code that causes the failure. Locate code according to the test command or output. E.g., a gtest failure often prints the file and line, analyze the code in that file and related files and think why the failure happens. Ask user if we're not sure about which file to locate.
+Read the test file and see what the test case is doing and what the test expectation is. Make sure the test expectation is reasonable. Ask user if you are not sure where the test file is.
+Understand which code is related to the test case. E.g., if a test is about EVM signextend opcode in multipass mode, then read the code about the signextend opcode implementation in multipass(JIT) mode. Ask user if you are not sure where the execution code is.
+
+### 3. Analyze the cause and debug the failure
+
+After analyze the code, we should know what input values cause the failure(ask user to provide input values if we don't know). Then we think why these input values can lead to the test failure. For JIT mode, we will asume the backend is correct at first, the failure mostly occurs in front end(bytecode visitor and mir builder). 
+We can add print in code or use gdb non-interactive mode to set breakpoint and print at certain lines to verify our assumption. But it's better to give suggestions to users to let them perform these operations themselves.
+
+#### GDB Debugging with Dynamic Libraries (e.g., libdtvmapi.so)
+
+When debugging tests that use dynamically loaded libraries (like fuzzer tests with `libdtvmapi.so`), breakpoints in the dynamic library cannot be set until the library is loaded. Use this workflow:
+
+**Interactive GDB Session:**
+```bash
+gdb --args ../build/bin/evmone-fuzzer crash-xxx
+```
+
+1. **First `r` (run)**: Let the program run so that `libdtvmapi.so` gets dynamically loaded
+2. **Set breakpoint `b EVMAnalyzer::analyze`**: Now the dynamic library is loaded, breakpoints can be set correctly
+3. **Second `r` (run)**: Restart the program, it will stop at the breakpoint
+4. **Print variables `p Bytecode` and `p BytecodeSize`**: Inspect the actual bytecode content
+
+**Example commands in GDB:**
+```
+(gdb) r
+... program runs and loads libdtvmapi.so ...
+(gdb) b EVMAnalyzer::analyze
+Breakpoint 1 at 0x...
+(gdb) r
+... program restarts and stops at breakpoint ...
+(gdb) p BytecodeSize
+$1 = 3
+(gdb) x/3xb Bytecode
+0x...: 0x49 0x00 0x00
+```
+
+**Key Points:**
+- Dynamic libraries are loaded at runtime, not at program start
+- Breakpoints in dynamic library functions cannot be set before the library is loaded
+- Must run once to load the library, then set breakpoints and run again
+- Use `x/Nxb Bytecode` to examine N bytes of bytecode in hex format
+
+#### Getting Correct Bytecode from Fuzzer Crash Files
+
+**Problem:** When analyzing fuzzer crash files, the raw file content is NOT the actual bytecode executed by the VM.
+
+Fuzzer input files (like `crash-xxx`) contain structured data that includes:
+- EVM revision information (first few bytes)
+- Message parameters (gas, sender, recipient, etc.)
+- Host context data
+- **The actual bytecode is only a portion of the file, extracted by `populate_input()`**
+
+For example, a crash file might be 27 bytes, but the actual bytecode executed could be only 3 bytes.
+
+**Two Correct Approaches:**
+
+1. **Analyze `populate_input()` in fuzzer.cpp**: Read the fuzzer's input parsing logic to understand how bytecode is extracted from the raw file. The bytecode typically starts at a specific offset after the header fields are parsed.
+
+2. **Use GDB to inspect at runtime**: Break at the point where bytecode is actually used (e.g., `EVMAnalyzer::analyze`) and print `Bytecode` and `BytecodeSize` to get the real values.
+
+**Wrong approach**: Directly parsing the crash file with `xxd` or `hexdump` and assuming all bytes are bytecode - this will lead to incorrect root cause analysis.
+
+**Example of the difference:**
+```
+# Wrong: Raw file content (27 bytes) - this is NOT the bytecode!
+$ xxd crash-xxx
+fbff600200000000000000f3fff702ad00f3ff0000000000490012
+
+# Correct: Actual bytecode (3 bytes) - obtained by analyzing populate_input() or using GDB
+BytecodeSize = 3
+Bytecode: 0x49 0x00 0x00   # BLOBHASH, STOP, STOP
+```
+
+**Lesson Learned:** Always understand how the test framework processes input data before analyzing. For evmone-fuzzer, read `populate_input()` in `fuzzer.cpp` to understand the input format, or use GDB to verify the actual values at runtime.
+
+### 4. Fix the test
+
+After understand the cause, provide a fix and ask users if they want to apply it.
+
+## References
+
+- [DTVM Architecture](references/dtvm_architecture.md) - Understanding execution modes
+- [Fuzzer Testing](references/fuzzer_testing.md) - Fuzzer testing setup and debugging guide
+- [Example](examples.md) - Example of debugging test failure
diff --git a/.claude/skills/dtvm-debug-test-failure/examples.md b/.claude/skills/dtvm-debug-test-failure/examples.md
@@ -0,0 +1,160 @@
+# Debug Test Failure - Examples
+
+This document provides usage examples for the debug-test-failure skill.
+
+## Example 1: Multipass Mode evmone unittest failure
+
+### User Request
+```
+The multipass test "./build/bin/evmone-unittests --gtest_filter=*evm.undefined_instructions*external*" fails. Help me debug it.
+```
+
+### Test Command
+According to .ci/run_test_suite.sh, the evmone multipass test command is "./run_unittests.sh ../tests/evmone_unittests/EVMOneMultipassUnitTestsRunList.txt mode=multipass", so we read the run_unittests.sh, the command is `./build/bin/evmone-unittests --gtest_filter="$FILTER_PARAM"`, which matches the user request, and we need export environment variable for it.
+
+```bash
+export EVMONE_OPTIONS=mode=multipass
+cd evmone
+./build/bin/evmone-unittests --gtest_filter=*evm.undefined_instructions*external*
+```
+
+### Expected Output
+All [ PASSED ], no failures in output.
+
+### Analysis
+There're several failures in output:
+```
+evm_test.cpp:637: Failure
+Expected equality of these values:
+  res.status_code
+    Which is: stack underflow
+  EVMC_UNDEFINED_INSTRUCTION
+    Which is: undefined instruction
+ for opcode 1b on revision Frontier
+```
+
+First we read the evm_test.cpp, and read the code related to it, understand how it runs. We see the evm_test.cpp is built in evmone directory, and it will load the library libdtvmapi.so which is built in DTVM. From the test name, we know it's related to undefined instructions, the result of undefined instruction is not correct. From the evm_test.cpp, we know the evm bytecode contains an undefined opcode. From the output, we see that the actual result is stack underflow rather than undefined instruction.
+Then we read the code in DTVM, the entrance is dt_evmc_vm.cpp, since we will asume the backend is correct, the failure is in front end, we read evm_bytecode_visitor.h, the actual result is stack underflow, so it is probably caused by `Builder.handleTrap(common::ErrorCode::EVMStackUnderflow);`. And in EVMMirBuilder::createStackCheckBlock, the stack underflow can also be caused by MIR instruction, if the stack size is less than MinSize. But the expect output is undefined instruction, which is caused by `Builder.handleUndefined();` in evm_bytecode_visitor.h. So probable reason is that the underflow check is before the undefined instruction check.
+So we add print statement before `Builder.createStackCheckBlock` in evm_bytecode_visitor.h to see the min size requirement and verify that the min size is not correct. Finally we see that the evm_analyzer.h does not process revision, it use same opcode tables for all bytecode revision, so the min size requirement is not correct and undefined instruction should be returned before the min size check.
+
+### Fix
+We need check undefined instruction in evm_analyzer.h, since it's a big change, just tell user the reason of failure and ask if we need fix it.
+
+---
+
+## Example 2: Fuzzer Blockhash Recording Mismatch
+
+### User Request
+```
+Analyze fuzzer crash file crash-0be3bc84feec8e8e36c6d55f1ac44cfd11d2213c
+```
+
+### Error Pattern
+```
+ASSERTION FAILED: "ref_host.recorded_blockhashes.size() == host.recorded_blockhashes.size()"
+	with 2 != 1
+```
+
+### Root Cause Analysis
+
+**The Issue**: DTVM calls `get_block_hash()` 1 time, while the reference evmone calls it 2 times.
+
+**Key Insight**: Fuzzer tests compare host call counts (like `recorded_blockhashes`). When DTVM caches results internally but evmone does not, the counts mismatch even if execution is logically correct.
+
+### Debugging Steps
+
+#### 1. Extract Actual Bytecode from Crash File
+
+**Wrong approach**: Parsing raw file bytes directly.
+```bash
+# This is WRONG - includes 24-byte header
+$ xxd crash-0be3bc84
+80 e6 00 00 00 01 00 50 38 38 38 00 00 01 00 00...
+```
+
+**Correct approach**: Use GDB at `EVMAnalyzer::analyze` to get actual bytecode.
+```bash
+gdb --args evmone-fuzzer crash-0be3bc84
+(gdb) r              # Let libdtvmapi.so load
+(gdb) b EVMAnalyzer::analyze
+(gdb) r              # Restart and stop at breakpoint
+(gdb) p BytecodeSize
+$1 = 20
+(gdb) x/20xb Bytecode
+0x...: 0x38 0x38 0x40 0x38... 0x40 0x01 0x00
+```
+
+Actual bytecode: `38 38 40 38 38 38 38 38 38 38 38 38 38 38 38 38 38 40 01 00` (20 bytes)
+
+#### 2. Disassemble Bytecode
+```
+Offset  Opcode  Name        Stack Effect
+----------------------------------------
+ 0      0x38    CODESIZE    Push: 1
+ 1      0x38    CODESIZE    Push: 1
+ 2      0x40    BLOCKHASH   Pop: 1, Push: 1  <-- First BLOCKHASH
+ 3-16   0x38    CODESIZE    (13 times)
+17      0x40    BLOCKHASH   Pop: 1, Push: 1  <-- Second BLOCKHASH
+18      0x01    ADD         Pop: 2, Push: 1
+19      0x00    STOP
+```
+
+Two BLOCKHASH opcodes at offsets 2 and 17. Both should use the same block number.
+
+#### 3. Check evmone Recording Behavior
+
+In `evmc/mocked_host.hpp`:
+```cpp
+bytes32 get_block_hash(int64_t block_number) const noexcept override
+{
+    recorded_blockhashes.emplace_back(block_number);  // ALWAYS records
+    return block_hash;
+}
+```
+
+evmone records every host call unconditionally.
+
+#### 4. Check DTVM Implementation
+
+In `src/compiler/evm_frontend/evm_imported.cpp`:
+```cpp
+const uint8_t *evmGetBlockHash(zen::runtime::EVMInstance *Instance,
+                               int64_t BlockNumber) {
+  auto &Cache = Instance->getMessageCache();
+  auto It = Cache.BlockHashes.find(BlockNumber);
+  if (It == Cache.BlockHashes.end()) {
+    // First call - calls host
+    evmc::bytes32 Hash = Module->Host->get_block_hash(BlockNumber);
+    Cache.BlockHashes[BlockNumber] = Hash;  // Cache result
+    return Cache.BlockHashes[BlockNumber].bytes;
+  }
+  return It->second.bytes;  // Second call - cached, NO host call!
+}
+```
+
+**Problem Identified**: DTVM caches blockhash results. Second BLOCKHASH with same block number returns cached value without calling host. Fuzzer sees 1 host call, expects 2.
+
+### The Fix
+
+Remove caching from `evmGetBlockHash()` to match evmone behavior:
+```cpp
+const uint8_t *evmGetBlockHash(zen::runtime::EVMInstance *Instance,
+                               int64_t BlockNumber) {
+  // Always call host to match evmone recording behavior
+  evmc::bytes32 Hash = Module->Host->get_block_hash(BlockNumber);
+  return ...;  // Return without caching
+}
+```
+
+### Key Lessons
+
+1. **Fuzzer tests compare host call counts**, not just execution results. Any caching that skips host calls causes mismatches.
+
+2. **Always use GDB to extract bytecode** from fuzzer crash files - raw file contains headers, not just bytecode.
+
+3. **For dynamic libraries**: Must run once to load `libdtvmapi.so`, then set breakpoints, then run again.
+
+4. **Cache behavior matters for fuzzer compatibility**: Even if caching is logically correct, it may break test expectations.
+
+---
+
diff --git a/.claude/skills/dtvm-debug-test-failure/references/dtvm_architecture.md b/.claude/skills/dtvm-debug-test-failure/references/dtvm_architecture.md
@@ -0,0 +1,119 @@
+# DTVM Architecture Reference
+
+## Execution Modes Overview
+
+DTVM provides three execution modes, each with distinct characteristics:
+
+### 1. Interpreter Mode
+- Direct bytecode interpretation
+- Highest compatibility, lowest performance
+- No compilation overhead
+- Best for debugging and verification
+
+### 2. Singlepass JIT Mode
+- Fast single-pass compilation
+- No LLVM dependency
+- Moderate performance improvement
+- Good for quick execution without heavy optimization
+- **Note: Not supported for EVM bytecode in DTVM**
+
+### 3. Multipass JIT Mode (LLVM-based)
+- Multiple optimization passes
+- Requires LLVM 15
+- Highest performance
+- Two sub-modes:
+  - **FLAT (Function Level fAst Transpile)**: Fast compilation
+  - **FLAS (Function Level Adaptive hot-Switching)**: Adaptive optimization
+
+## Compilation Pipeline
+
+```
+Input (Wasm/EVM) → Frontend → dMIR → Execution Modes → Native Code
+                              ↓
+                    Deterministic MIR
+```
+
+### Key Components
+
+**dMIR (Deterministic MIR)**
+- Middle Intermediate Representation
+- Ensures deterministic behavior
+- Common IR for all execution modes
+- Platform-independent
+
+**Frontend**
+- Wasm frontend: Parses WebAssembly binary/text format
+- EVM frontend: Parses EVM bytecode
+- Validation and normalization
+
+**Target Code Generation**
+- Interpreter: Direct dMIR execution
+- Singlepass JIT: Assembly code generation (Wasm only)
+- Multipass JIT: LLVM IR generation → Machine code (Wasm and EVM)
+
+## Memory Management
+
+### Memory Pool System
+- Uses mmap for large allocations
+- Reduces fragmentation
+- Supports deterministic allocation patterns
+- Configurable pool sizes
+
+### Deterministic Allocation
+- Same input always produces same memory layout
+- No reliance on malloc/free ordering
+- Predictable address space usage
+- Critical for consensus systems
+
+## JIT Implementation Details
+
+### Singlepass JIT
+- Linear scan register allocation
+- Minimal optimization passes
+- Direct dMIR to assembly translation
+- Fallback mechanism for unsupported operations
+
+### Multipass JIT
+- LLVM-based optimization pipeline
+- Typical passes:
+  - Dead code elimination
+  - Constant folding
+  - Common subexpression elimination
+  - Loop optimizations
+- Hotness detection for FLAS mode
+- Profiling-guided optimization
+
+## Build Configuration
+
+### Essential Options
+```bash
+# Interpreter only (minimum)
+cmake -B build
+
+# With Singlepass JIT
+cmake -B build -DZEN_ENABLE_SINGLEPASS_JIT=ON
+
+# With Multipass JIT (requires LLVM 15)
+cmake -B build -DZEN_ENABLE_MULTIPASS_JIT=ON \
+      -DLLVM_DIR=/path/to/llvm/lib/cmake/llvm
+
+# Debugging options
+cmake -B build -DZEN_ENABLE_ASAN=ON         # AddressSanitizer
+cmake -B build -DZEN_ENABLE_PROFILER=ON     # Performance profiling
+cmake -B build -DZEN_ENABLE_SPEC_TEST=ON    # WebAssembly tests
+cmake -B build -DZEN_ENABLE_EVM=ON          # EVM support
+cmake -B build -DZEN_ENABLE_LIBEVM=ON       # EVMC library support
+```
+
+## Testing Infrastructure
+
+### Test Scripts
+- `.ci/run_test_suite.sh`: Main test script
+- `.github/workflows/dtvm_wasm_test_x86.yml`: Test environment for wasm
+- `.github/workflows/dtvm_evm_test_x86.yml`: Test environment for evm
+
+### Test Categories
+1. **microsuite**: ctest suite for wasm
+2. **evmtestsuite**: ctest suite for evm
+3. **evmrealsuite**: test for real evm bytecode
+4. **evmonetestsuite**: run in evmone's unittest