Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
111 changes: 111 additions & 0 deletions .claude/skills/dtvm-debug-test-failure/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
name: dtvm-debug-test-failure
description: Comprehensive test failure debugging and analysis for DTVM. Use when tests fail in any execution mode (interpreter, singlepass JIT, multipass JIT) to analyze failure patterns, extract error logs, and generate debugging reports, and provide fix if possible. Triggers on test failures, ctest errors, or when investigating why specific test cases fail.
allowed-tools: Bash, Read, Grep
---

# Debug Test Failure Skill

This skill helps analyze and debug test failures in DTVM's multi-mode execution environment.

## Quick Start

When a test fails, use this skill to:
1. Parse test output and identify failure patterns
2. Address related code and analyze the cause of the failure
3. Add print in related code or use gdb to verify the cause
4. Provide fix to test failures

## Failure Analysis Workflow

### 1. Identify Test Failure

Understand which test fails and how to run test. Let user provide a command or a script file to run the test, check the test output and search for failure patterns(such as fail, failure, etc). If there's a failure, read error information for it.

### 2. Address the code and understand the test case

Find code that causes the failure. Locate code according to the test command or output. E.g., a gtest failure often prints the file and line, analyze the code in that file and related files and think why the failure happens. Ask user if we're not sure about which file to locate.
Read the test file and see what the test case is doing and what the test expectation is. Make sure the test expectation is reasonable. Ask user if you are not sure where the test file is.
Understand which code is related to the test case. E.g., if a test is about EVM signextend opcode in multipass mode, then read the code about the signextend opcode implementation in multipass(JIT) mode. Ask user if you are not sure where the execution code is.

### 3. Analyze the cause and debug the failure

After analyze the code, we should know what input values cause the failure(ask user to provide input values if we don't know). Then we think why these input values can lead to the test failure. For JIT mode, we will asume the backend is correct at first, the failure mostly occurs in front end(bytecode visitor and mir builder).
We can add print in code or use gdb non-interactive mode to set breakpoint and print at certain lines to verify our assumption. But it's better to give suggestions to users to let them perform these operations themselves.

#### GDB Debugging with Dynamic Libraries (e.g., libdtvmapi.so)

When debugging tests that use dynamically loaded libraries (like fuzzer tests with `libdtvmapi.so`), breakpoints in the dynamic library cannot be set until the library is loaded. Use this workflow:

**Interactive GDB Session:**
```bash
gdb --args ../build/bin/evmone-fuzzer crash-xxx
```

1. **First `r` (run)**: Let the program run so that `libdtvmapi.so` gets dynamically loaded
2. **Set breakpoint `b EVMAnalyzer::analyze`**: Now the dynamic library is loaded, breakpoints can be set correctly
3. **Second `r` (run)**: Restart the program, it will stop at the breakpoint
4. **Print variables `p Bytecode` and `p BytecodeSize`**: Inspect the actual bytecode content

**Example commands in GDB:**
```
(gdb) r
... program runs and loads libdtvmapi.so ...
(gdb) b EVMAnalyzer::analyze
Breakpoint 1 at 0x...
(gdb) r
... program restarts and stops at breakpoint ...
(gdb) p BytecodeSize
$1 = 3
(gdb) x/3xb Bytecode
0x...: 0x49 0x00 0x00
```

**Key Points:**
- Dynamic libraries are loaded at runtime, not at program start
- Breakpoints in dynamic library functions cannot be set before the library is loaded
- Must run once to load the library, then set breakpoints and run again
- Use `x/Nxb Bytecode` to examine N bytes of bytecode in hex format

#### Getting Correct Bytecode from Fuzzer Crash Files

**Problem:** When analyzing fuzzer crash files, the raw file content is NOT the actual bytecode executed by the VM.

Fuzzer input files (like `crash-xxx`) contain structured data that includes:
- EVM revision information (first few bytes)
- Message parameters (gas, sender, recipient, etc.)
- Host context data
- **The actual bytecode is only a portion of the file, extracted by `populate_input()`**

For example, a crash file might be 27 bytes, but the actual bytecode executed could be only 3 bytes.

**Two Correct Approaches:**

1. **Analyze `populate_input()` in fuzzer.cpp**: Read the fuzzer's input parsing logic to understand how bytecode is extracted from the raw file. The bytecode typically starts at a specific offset after the header fields are parsed.

2. **Use GDB to inspect at runtime**: Break at the point where bytecode is actually used (e.g., `EVMAnalyzer::analyze`) and print `Bytecode` and `BytecodeSize` to get the real values.

**Wrong approach**: Directly parsing the crash file with `xxd` or `hexdump` and assuming all bytes are bytecode - this will lead to incorrect root cause analysis.

**Example of the difference:**
```
# Wrong: Raw file content (27 bytes) - this is NOT the bytecode!
$ xxd crash-xxx
fbff600200000000000000f3fff702ad00f3ff0000000000490012

# Correct: Actual bytecode (3 bytes) - obtained by analyzing populate_input() or using GDB
BytecodeSize = 3
Bytecode: 0x49 0x00 0x00 # BLOBHASH, STOP, STOP
```

**Lesson Learned:** Always understand how the test framework processes input data before analyzing. For evmone-fuzzer, read `populate_input()` in `fuzzer.cpp` to understand the input format, or use GDB to verify the actual values at runtime.

### 4. Fix the test

After understand the cause, provide a fix and ask users if they want to apply it.

## References

- [DTVM Architecture](references/dtvm_architecture.md) - Understanding execution modes
- [Fuzzer Testing](references/fuzzer_testing.md) - Fuzzer testing setup and debugging guide
- [Example](examples.md) - Example of debugging test failure
160 changes: 160 additions & 0 deletions .claude/skills/dtvm-debug-test-failure/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Debug Test Failure - Examples

This document provides usage examples for the debug-test-failure skill.

## Example 1: Multipass Mode evmone unittest failure

### User Request
```
The multipass test "./build/bin/evmone-unittests --gtest_filter=*evm.undefined_instructions*external*" fails. Help me debug it.
```

### Test Command
According to .ci/run_test_suite.sh, the evmone multipass test command is "./run_unittests.sh ../tests/evmone_unittests/EVMOneMultipassUnitTestsRunList.txt mode=multipass", so we read the run_unittests.sh, the command is `./build/bin/evmone-unittests --gtest_filter="$FILTER_PARAM"`, which matches the user request, and we need export environment variable for it.

```bash
export EVMONE_OPTIONS=mode=multipass
cd evmone
./build/bin/evmone-unittests --gtest_filter=*evm.undefined_instructions*external*
```

### Expected Output
All [ PASSED ], no failures in output.

### Analysis
There're several failures in output:
```
evm_test.cpp:637: Failure
Expected equality of these values:
res.status_code
Which is: stack underflow
EVMC_UNDEFINED_INSTRUCTION
Which is: undefined instruction
for opcode 1b on revision Frontier
```

First we read the evm_test.cpp, and read the code related to it, understand how it runs. We see the evm_test.cpp is built in evmone directory, and it will load the library libdtvmapi.so which is built in DTVM. From the test name, we know it's related to undefined instructions, the result of undefined instruction is not correct. From the evm_test.cpp, we know the evm bytecode contains an undefined opcode. From the output, we see that the actual result is stack underflow rather than undefined instruction.
Then we read the code in DTVM, the entrance is dt_evmc_vm.cpp, since we will asume the backend is correct, the failure is in front end, we read evm_bytecode_visitor.h, the actual result is stack underflow, so it is probably caused by `Builder.handleTrap(common::ErrorCode::EVMStackUnderflow);`. And in EVMMirBuilder::createStackCheckBlock, the stack underflow can also be caused by MIR instruction, if the stack size is less than MinSize. But the expect output is undefined instruction, which is caused by `Builder.handleUndefined();` in evm_bytecode_visitor.h. So probable reason is that the underflow check is before the undefined instruction check.
So we add print statement before `Builder.createStackCheckBlock` in evm_bytecode_visitor.h to see the min size requirement and verify that the min size is not correct. Finally we see that the evm_analyzer.h does not process revision, it use same opcode tables for all bytecode revision, so the min size requirement is not correct and undefined instruction should be returned before the min size check.

### Fix
We need check undefined instruction in evm_analyzer.h, since it's a big change, just tell user the reason of failure and ask if we need fix it.

---

## Example 2: Fuzzer Blockhash Recording Mismatch

### User Request
```
Analyze fuzzer crash file crash-0be3bc84feec8e8e36c6d55f1ac44cfd11d2213c
```

### Error Pattern
```
ASSERTION FAILED: "ref_host.recorded_blockhashes.size() == host.recorded_blockhashes.size()"
with 2 != 1
```

### Root Cause Analysis

**The Issue**: DTVM calls `get_block_hash()` 1 time, while the reference evmone calls it 2 times.

**Key Insight**: Fuzzer tests compare host call counts (like `recorded_blockhashes`). When DTVM caches results internally but evmone does not, the counts mismatch even if execution is logically correct.

### Debugging Steps

#### 1. Extract Actual Bytecode from Crash File

**Wrong approach**: Parsing raw file bytes directly.
```bash
# This is WRONG - includes 24-byte header
$ xxd crash-0be3bc84
80 e6 00 00 00 01 00 50 38 38 38 00 00 01 00 00...
```

**Correct approach**: Use GDB at `EVMAnalyzer::analyze` to get actual bytecode.
```bash
gdb --args evmone-fuzzer crash-0be3bc84
(gdb) r # Let libdtvmapi.so load
(gdb) b EVMAnalyzer::analyze
(gdb) r # Restart and stop at breakpoint
(gdb) p BytecodeSize
$1 = 20
(gdb) x/20xb Bytecode
0x...: 0x38 0x38 0x40 0x38... 0x40 0x01 0x00
```

Actual bytecode: `38 38 40 38 38 38 38 38 38 38 38 38 38 38 38 38 38 40 01 00` (20 bytes)

#### 2. Disassemble Bytecode
```
Offset Opcode Name Stack Effect
----------------------------------------
0 0x38 CODESIZE Push: 1
1 0x38 CODESIZE Push: 1
2 0x40 BLOCKHASH Pop: 1, Push: 1 <-- First BLOCKHASH
3-16 0x38 CODESIZE (13 times)
17 0x40 BLOCKHASH Pop: 1, Push: 1 <-- Second BLOCKHASH
18 0x01 ADD Pop: 2, Push: 1
19 0x00 STOP
```

Two BLOCKHASH opcodes at offsets 2 and 17. Both should use the same block number.

#### 3. Check evmone Recording Behavior

In `evmc/mocked_host.hpp`:
```cpp
bytes32 get_block_hash(int64_t block_number) const noexcept override
{
recorded_blockhashes.emplace_back(block_number); // ALWAYS records
return block_hash;
}
```

evmone records every host call unconditionally.

#### 4. Check DTVM Implementation

In `src/compiler/evm_frontend/evm_imported.cpp`:
```cpp
const uint8_t *evmGetBlockHash(zen::runtime::EVMInstance *Instance,
int64_t BlockNumber) {
auto &Cache = Instance->getMessageCache();
auto It = Cache.BlockHashes.find(BlockNumber);
if (It == Cache.BlockHashes.end()) {
// First call - calls host
evmc::bytes32 Hash = Module->Host->get_block_hash(BlockNumber);
Cache.BlockHashes[BlockNumber] = Hash; // Cache result
return Cache.BlockHashes[BlockNumber].bytes;
}
return It->second.bytes; // Second call - cached, NO host call!
}
```

**Problem Identified**: DTVM caches blockhash results. Second BLOCKHASH with same block number returns cached value without calling host. Fuzzer sees 1 host call, expects 2.

### The Fix

Remove caching from `evmGetBlockHash()` to match evmone behavior:
```cpp
const uint8_t *evmGetBlockHash(zen::runtime::EVMInstance *Instance,
int64_t BlockNumber) {
// Always call host to match evmone recording behavior
evmc::bytes32 Hash = Module->Host->get_block_hash(BlockNumber);
return ...; // Return without caching
}
```

### Key Lessons

1. **Fuzzer tests compare host call counts**, not just execution results. Any caching that skips host calls causes mismatches.

2. **Always use GDB to extract bytecode** from fuzzer crash files - raw file contains headers, not just bytecode.

3. **For dynamic libraries**: Must run once to load `libdtvmapi.so`, then set breakpoints, then run again.

4. **Cache behavior matters for fuzzer compatibility**: Even if caching is logically correct, it may break test expectations.

---

119 changes: 119 additions & 0 deletions .claude/skills/dtvm-debug-test-failure/references/dtvm_architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# DTVM Architecture Reference

## Execution Modes Overview

DTVM provides three execution modes, each with distinct characteristics:

### 1. Interpreter Mode
- Direct bytecode interpretation
- Highest compatibility, lowest performance
- No compilation overhead
- Best for debugging and verification

### 2. Singlepass JIT Mode
- Fast single-pass compilation
- No LLVM dependency
- Moderate performance improvement
- Good for quick execution without heavy optimization
- **Note: Not supported for EVM bytecode in DTVM**

### 3. Multipass JIT Mode (LLVM-based)
- Multiple optimization passes
- Requires LLVM 15
- Highest performance
- Two sub-modes:
- **FLAT (Function Level fAst Transpile)**: Fast compilation
- **FLAS (Function Level Adaptive hot-Switching)**: Adaptive optimization

## Compilation Pipeline

```
Input (Wasm/EVM) → Frontend → dMIR → Execution Modes → Native Code
Deterministic MIR
```

### Key Components

**dMIR (Deterministic MIR)**
- Middle Intermediate Representation
- Ensures deterministic behavior
- Common IR for all execution modes
- Platform-independent

**Frontend**
- Wasm frontend: Parses WebAssembly binary/text format
- EVM frontend: Parses EVM bytecode
- Validation and normalization

**Target Code Generation**
- Interpreter: Direct dMIR execution
- Singlepass JIT: Assembly code generation (Wasm only)
- Multipass JIT: LLVM IR generation → Machine code (Wasm and EVM)

## Memory Management

### Memory Pool System
- Uses mmap for large allocations
- Reduces fragmentation
- Supports deterministic allocation patterns
- Configurable pool sizes

### Deterministic Allocation
- Same input always produces same memory layout
- No reliance on malloc/free ordering
- Predictable address space usage
- Critical for consensus systems

## JIT Implementation Details

### Singlepass JIT
- Linear scan register allocation
- Minimal optimization passes
- Direct dMIR to assembly translation
- Fallback mechanism for unsupported operations

### Multipass JIT
- LLVM-based optimization pipeline
- Typical passes:
- Dead code elimination
- Constant folding
- Common subexpression elimination
- Loop optimizations
- Hotness detection for FLAS mode
- Profiling-guided optimization

## Build Configuration

### Essential Options
```bash
# Interpreter only (minimum)
cmake -B build

# With Singlepass JIT
cmake -B build -DZEN_ENABLE_SINGLEPASS_JIT=ON

# With Multipass JIT (requires LLVM 15)
cmake -B build -DZEN_ENABLE_MULTIPASS_JIT=ON \
-DLLVM_DIR=/path/to/llvm/lib/cmake/llvm

# Debugging options
cmake -B build -DZEN_ENABLE_ASAN=ON # AddressSanitizer
cmake -B build -DZEN_ENABLE_PROFILER=ON # Performance profiling
cmake -B build -DZEN_ENABLE_SPEC_TEST=ON # WebAssembly tests
cmake -B build -DZEN_ENABLE_EVM=ON # EVM support
cmake -B build -DZEN_ENABLE_LIBEVM=ON # EVMC library support
```

## Testing Infrastructure

### Test Scripts
- `.ci/run_test_suite.sh`: Main test script
- `.github/workflows/dtvm_wasm_test_x86.yml`: Test environment for wasm
- `.github/workflows/dtvm_evm_test_x86.yml`: Test environment for evm

### Test Categories
1. **microsuite**: ctest suite for wasm
2. **evmtestsuite**: ctest suite for evm
3. **evmrealsuite**: test for real evm bytecode
4. **evmonetestsuite**: run in evmone's unittest
Loading
Loading