Skip to content

feat: backward and forward LET testing + runbook + tool #1502

Draft
arnaubennassar wants to merge 9 commits intodevelopfrom
feat/back-and-for-let
Draft

feat: backward and forward LET testing + runbook + tool #1502
arnaubennassar wants to merge 9 commits intodevelopfrom
feat/back-and-for-let

Conversation

@arnaubennassar
Copy link
Collaborator

🔄 Changes Summary

  • [Brief description of what was added, fixed, or changed in this PR]

⚠️ Breaking Changes

  • 🛠️ Config: [Optional: Changes to TOML config]
  • 🔌 API/CLI: [Optional: Breaking interface changes]
  • 🗑️ Deprecated Features: [Optional: Removed features]

📋 Config Updates

  • 🧾 Diff/Config snippet: [Optional: Enumerate config changes/provide snippet of changes]

✅ Testing

  • 🤖 Automatic: [Optional: Enumerate E2E tests]
  • 🖱️ Manual: [Optional: Steps to verify]

🐞 Issues

  • Closes #[issue-number]

🔗 Related PRs

  • [Optional: Enumerate related pull requests]

📝 Notes

  • [Optional: design decisions, tradeoffs, or TODOs]

@arnaubennassar arnaubennassar marked this pull request as draft March 2, 2026 10:05
@arnaubennassar arnaubennassar self-assigned this Mar 2, 2026
@claude
Copy link

claude bot commented Mar 2, 2026

Claude finished @arnaubennassar's task in 2m 27s —— View job


PR Review Complete ✅

Reviewing PR #1502: Backward/Forward LET Tool


🚀 What's New

This PR introduces a comprehensive Backward/Forward LET (Local Exit Tree) recovery tool that automates the diagnosis and recovery process when the L1 settled LER (Local Exit Root) diverges from the L2 on-chain bridge state. The implementation includes:

  • New diagnostic tool (tools/backward_forward_let/) that automatically detects and classifies LET divergence into 4 recovery cases
  • Debug RPC endpoint (DebugSendCertificate) for testing/simulating certificate divergence scenarios (test-only, disabled by default)
  • New aggsender RPC method (GetCertificateBridgeExits) to retrieve bridge exits from stored certificates
  • Comprehensive E2E tests covering all 4 recovery cases plus edge cases
  • Detailed runbook documentation (984 lines) with step-by-step recovery procedures

🐛 Bug Fixes

None - this is a new feature implementation.

📋 Config Updates

Two new optional aggsender config fields (default disabled):

[AggSender]
EnableDebugSendCertificate = false  # Enable debug cert endpoint (test-only)
DebugSendCertificateAuthAddress = "0x..." # Auth address for debug endpoint

⚠️ Breaking Changes

None - all changes are additive with backward-compatible defaults.


Code Review Feedback

🔴 Required (Critical Issues)

1. Binary file committed to repository

Location: main (root directory)
Issue: A 46MB ELF executable binary is committed to the repository. This should never be committed.

Fix: Remove the binary and add it to .gitignore:

git rm main
echo "main" >> .gitignore

2. Missing validation in aggsender config

Location: aggsender/config/config.go:186-209
Issue: The Validate() function doesn't check the debug endpoint configuration. If EnableDebugSendCertificate is true but DebugSendCertificateAuthAddress is zero, the endpoint would allow anyone to send certificates (severe security issue).

Fix: Add validation:

func (c Config) Validate() error {
    // ... existing validation ...
    
    if c.EnableDebugSendCertificate {
        if c.DebugSendCertificateAuthAddress == (ethCommon.Address{}) {
            return fmt.Errorf("DebugSendCertificateAuthAddress must be set when EnableDebugSendCertificate is true")
        }
    }
    return nil
}

3. Signature recovery vulnerability

Location: aggsender/rpc/aggsender_rpc.go:159
Issue: The signature recovery doesn't use Ethereum's message prefix, making it susceptible to cross-protocol signature replay. An attacker could use a signature from another context (e.g., a valid Ethereum transaction signature) to authenticate.

Fix: Apply Ethereum's standard message hash:

func HashCertificateForDebugAuth(cert *agglayertypes.Certificate) (common.Hash, error) {
    data, err := json.Marshal(cert)
    if err != nil {
        return common.Hash{}, fmt.Errorf("HashCertificateForDebugAuth: marshal error: %w", err)
    }
    // Apply Ethereum signed message hash
    msg := fmt.Sprintf("\x19Ethereum Signed Message:\n%d%s", len(data), data)
    return crypto.Keccak256Hash([]byte(msg)), nil
}

Also update client code in aggsender/rpcclient/client.go to match.

🟡 Suggested (Important Improvements)

4. Potential integer overflow in Merkle computation

Location: tools/backward_forward_let/helpers.go:188
Issue: 2*j+1 could theoretically overflow on line 193 if nextLen is close to maxInt/2, though unlikely in practice.

Suggestion: Add a safety check or use checked arithmetic:

nextLen := (len(currentLevel) + 1) / 2
if nextLen > (1 << 30) { // Sanity check for tree size
    return [32]common.Hash{}, fmt.Errorf("merkle tree too large: %d nodes", nextLen)
}

5. Error handling in deferred emergency deactivation

Location: tools/backward_forward_let/recovery.go:53-57
Issue: If stepDeactivateEmergency fails in the defer, the error is only printed. The L2 bridge could remain in emergency state, blocking normal operations.

Suggestion: Return the deactivation error or provide clearer guidance:

defer func() {
    if deactivateErr := stepDeactivateEmergency(ctx, env, unpauserAuth, callOpts); deactivateErr != nil {
        // Log prominently and return error
        fmt.Printf("CRITICAL: Failed to deactivate emergency state: %v\n", deactivateErr)
        fmt.Printf("MANUAL ACTION REQUIRED: Bridge remains in emergency state. Run deactivation manually.\n")
        if err == nil {
            err = fmt.Errorf("recovery succeeded but failed to deactivate emergency state: %w", deactivateErr)
        }
    }
}()

6. Missing bounds check in diagnosis

Location: tools/backward_forward_let/diagnosis.go:226
Issue: The computeUndercollateralization function doesn't validate that leaf.TokenInfo fields are reasonable (e.g., amount is non-negative, origin network is valid).

Suggestion: Add validation:

for _, leaf := range leaves {
    if leaf.TokenInfo == nil {
        continue
    }
    amount := leaf.Amount
    if amount == nil {
        amount = big.NewInt(0)
    }
    if amount.Sign() < 0 {
        // Skip negative amounts (invalid)
        continue
    }
    // ... rest of logic
}

7. Race condition in aggsender startup check

Location: Plan indicates aggsender should be disabled when debug mode is enabled, but implementation unclear
Issue: The plan states "when EnableDebugSendCertificate is true, the aggsender MUST NOT run its normal certificate sending loop" but I don't see this enforcement in aggsender/aggsender.go.

Suggestion: Add startup validation and loop check:

// In New() or Start()
if cfg.EnableDebugSendCertificate {
    logger.Warn("Debug certificate endpoint enabled - normal certificate sending is DISABLED")
    // Set a flag or return early in the main sending loop
}

// In main sending loop
if a.cfg.EnableDebugSendCertificate {
    time.Sleep(5 * time.Second)
    continue // Skip certificate building/sending
}

8. JSON determinism in debug auth

Location: aggsender/rpc/debug_cert_hash.go:14
Issue: json.Marshal doesn't guarantee deterministic output order for maps, which could cause signature verification failures if different implementations serialize differently.

Suggestion: Use canonical JSON or a deterministic serialization:

import "github.com/gibson042/canonicaljson-go"

func HashCertificateForDebugAuth(cert *agglayertypes.Certificate) (common.Hash, error) {
    data, err := canonicaljson.Marshal(cert)
    if err != nil {
        return common.Hash{}, fmt.Errorf("HashCertificateForDebugAuth: marshal error: %w", err)
    }
    return crypto.Keccak256Hash(data), nil
}

🟢 Optional (Code Quality & Style)

9. Verbose error messages in diagnosis

Location: tools/backward_forward_let/diagnosis.go:273-303
Suggestion: The PrintDiagnosis output is very detailed (good for CLI) but could benefit from a structured format option (JSON) for programmatic parsing.

10. Missing godoc on exported functions

Location: Multiple files in tools/backward_forward_let/
Suggestion: Some exported functions lack doc comments. Add them for better code documentation:

// BridgeExitToLeafData converts an agglayer BridgeExit to a bridgesync.LeafData.
// It handles nil amounts and token info gracefully.
func BridgeExitToLeafData(be *agglayertypes.BridgeExit) bridgesync.LeafData {

11. Magic numbers in helpers

Location: tools/backward_forward_let/helpers.go:27
Suggestion: merkleZeroHashCount = 33 is explained in a comment but could use a more descriptive name like merkleTreeHeight or merkleMaxDepth.

12. Test coverage for edge cases

Location: tools/backward_forward_let/diagnosis_test.go
Observation: Test coverage looks good but missing test for:

  • DivergencePoint = 0 (all certificates diverge)
  • L2DepositCount = 0 (empty L2 tree)
  • Very large deposit counts (stress test)

Consider adding these edge case tests.


Security Assessment

Authentication: Debug endpoint uses Ethereum signature verification
⚠️ Signature Format: Missing message prefix (see Critical Issue #3)
Authorization: Single authorized address check
Default Security: Debug mode disabled by default
⚠️ Config Validation: Missing validation for zero auth address (see Critical Issue #2)
Emergency State: Proper use of emergency mode for recovery
Transaction Verification: All contract calls verified with receipts

Performance Considerations

  • Merkle computation: O(n log n) for n leaves - acceptable for expected use
  • Bridge service queries: O(n) sequential queries in fetchL2LeafHashesUpTo - could be optimized with batch API if available
  • No database indexes: Should verify indexes exist on certificate_info.height for efficient lookups

Testing Assessment

✅ Comprehensive E2E tests for all 4 recovery cases
✅ Unit tests for Merkle computations
✅ Unit tests for case classification
✅ Unit tests for undercollateralization
⚠️ Missing integration tests for aggsender disable behavior
⚠️ E2E tests rely on timing (sleep) - could be flaky


Recommendation

Approve with required changes: This is a well-designed and thoroughly documented feature. The implementation follows the detailed plan, includes comprehensive tests, and provides valuable tooling for LET divergence recovery.

Before merging:

  1. 🔴 Remove the main binary (Critical)
  2. 🔴 Add config validation for debug auth address (Critical)
  3. 🔴 Fix signature verification to use Ethereum message prefix (Critical)
  4. 🟡 Address defer error handling for emergency deactivation (Important)

The other suggestions can be addressed in follow-up PRs if needed.


@arnaubennassar arnaubennassar changed the title Feat/back and for let feat: backward and forward LET testing + runbook + tool Mar 2, 2026
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 2, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots
76.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant