Skip to content

feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API#3423

Open
amir-deris wants to merge 4 commits into
amir/plt-323-migrate-app-metrics-to-otel-meter-apifrom
amir/plt-329-migrate-x-evm-metrics-to-otel
Open

feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API#3423
amir-deris wants to merge 4 commits into
amir/plt-323-migrate-app-metrics-to-otel-meter-apifrom
amir/plt-329-migrate-x-evm-metrics-to-otel

Conversation

@amir-deris
Copy link
Copy Markdown
Contributor

@amir-deris amir-deris commented May 12, 2026

Summary

Migrates telemetry in the x/evm/keeper and x/evm/ante packages from the legacy telemetry/utilmetrics helpers to the standardized OpenTelemetry Meter API, following the same pattern established in #3396 for app and app/ante.

  • Adds x/evm/keeper/metrics.go with a struct-based OTel instrument set registered via otel.Meter("evm_keeper"), initialized once via InitEvmKeeperMetrics() called from app.go after SetupOtelMetricsProvider()
  • Adds x/evm/ante/metrics.go with an evmAnteMetrics struct via otel.Meter("evm_ante"), initialized once via InitEvmAnteMetrics()
  • Replaces telemetry.ModuleMeasureSince / utilmetrics / seimetrics.SafeTelemetry* calls in all target files with dual-emitted OTel instruments; legacy calls remain with TODO(PLT-330) markers until dashboards are migrated
  • Threads ctx.Context() / goCtx through all .Record() and .Add() call sites
  • Disambiguates the evm_association_error_total counter that was registered by both keeper and ante into evm_keeper_association_error_total and evm_ante_association_error_total

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

ABCI phase durations — histograms bucketed at fine-grained ms thresholds:

  • evm_abci_begin_blocker_duration_seconds
  • evm_abci_end_blocker_duration_seconds

Block fee:

  • evm_block_base_fee — synchronous gauge set each EndBlock

EVM transaction counters:

  • evm_panics_total
  • evm_errors_total — by type label (state_transition, stateDB_finalize, write_receipt, apply_message, vm_execution)
  • evm_receipt_status_total — by status label (success, failed)

Association errors:

  • evm_keeper_association_error_total — by scenario and type labels (emitted from keeper)
  • evm_ante_association_error_total — by scenario and type labels (emitted from ante)

Zero-storage cleanup:

  • evm_zero_storage_processed_keys_total
  • evm_zero_storage_pruned_keys_total
  • evm_zero_storage_pruned_bytes_total

Ante nonce/fee tracking:

  • evm_pending_nonce_total — by event label (added, expired, rejected, accepted)
  • evm_nonce_mismatch_total — by cause label (too_high, too_low)
  • evm_effective_gas_price — histogram of effective gas price per EVM transaction

Local observability tooling

Adds docker/docker-compose.monitoring.yml — a compose overlay that brings up sei-prometheus and sei-grafana on the existing localnet network alongside the four validator nodes. Prometheus scrapes the nodes directly by container hostname (port 1317) since it shares the network; Grafana is pre-provisioned with Prometheus as a data source and the existing dashboards directory.

Two new Makefile targets:

  • make docker-cluster-start-monitoring — builds and starts all six containers; Prometheus/Grafana logs are suppressed from the terminal output (--no-attach) but remain accessible via docker logs
  • make docker-cluster-stop-monitoring — tears everything down together via docker compose down

Note

Medium Risk
Touches EVM transaction/ABCI hot paths to add dual-emitted OpenTelemetry metrics alongside existing telemetry helpers; while intended to be non-functional, any context/label or timing changes could impact performance or panic behavior. Also adds optional localnet Prometheus/Grafana compose overlay and Makefile targets, which is low risk.

Overview
Migrates x/evm keeper and ante telemetry to OpenTelemetry Meter instruments, adding new metrics.go registrations plus tests, and wiring InitEvmKeeperMetrics() / InitEvmAnteMetrics() into app/app.go startup.

Updates EVM ante/keeper code paths to record OTel counters/histograms/gauges for pending nonce events, nonce mismatches, effective gas price, association errors, ABCI BeginBlock/EndBlock durations, base fee, tx processing errors/panics/receipt status, and zero-storage cleanup, while keeping legacy utilmetrics/seimetrics emissions with TODO(PLT-330) notes for transition.

Adds local observability tooling via docker-compose.monitoring.yml (Prometheus + Grafana provisioning files) and new Makefile targets docker-cluster-start-monitoring / docker-cluster-stop-monitoring to run the 4-node localnet with monitoring enabled.

Reviewed by Cursor Bugbot for commit 96669ca. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 12, 2026, 10:47 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 90.97744% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.39%. Comparing base (f2384dd) to head (96669ca).

Files with missing lines Patch % Lines
x/evm/keeper/msg_server.go 62.50% 6 Missing ⚠️
x/evm/ante/metrics.go 91.66% 1 Missing and 1 partial ⚠️
x/evm/ante/sig.go 85.71% 2 Missing ⚠️
x/evm/keeper/metrics.go 96.15% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@                                  Coverage Diff                                   @@
##           amir/plt-323-migrate-app-metrics-to-otel-meter-api    #3423      +/-   ##
======================================================================================
+ Coverage                                               59.37%   59.39%   +0.02%     
======================================================================================
  Files                                                    2114     2115       +1     
  Lines                                                  174919   174983      +64     
======================================================================================
+ Hits                                                   103859   103937      +78     
+ Misses                                                  62019    62004      -15     
- Partials                                                 9041     9042       +1     
Flag Coverage Δ
sei-chain-pr 61.63% <90.97%> (+10.82%) ⬆️
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
app/app.go 68.19% <100.00%> (+0.03%) ⬆️
x/evm/ante/fee.go 73.43% <100.00%> (+0.42%) ⬆️
x/evm/ante/preprocess.go 80.23% <100.00%> (+0.23%) ⬆️
x/evm/keeper/abci.go 58.97% <100.00%> (+3.41%) ⬆️
x/evm/keeper/evm.go 74.81% <100.00%> (+0.18%) ⬆️
x/evm/keeper/storage_cleanup.go 89.39% <100.00%> (+0.50%) ⬆️
x/evm/ante/metrics.go 91.66% <91.66%> (ø)
x/evm/ante/sig.go 82.35% <85.71%> (+0.99%) ⬆️
x/evm/keeper/metrics.go 96.15% <96.15%> (ø)
x/evm/keeper/msg_server.go 80.36% <62.50%> (+1.94%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread x/evm/keeper/metrics.go
@amir-deris amir-deris changed the title Amir/plt 329 migrate x evm metrics to otel feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API May 12, 2026
@amir-deris amir-deris self-assigned this May 12, 2026
Comment thread Makefile
else \
DETACH_FLAG=""; \
fi; \
DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with docker logs sei-grafana and docker logs sei-prometheus

@amir-deris amir-deris requested a review from bdchatham May 12, 2026 22:32
Comment thread x/evm/keeper/metrics.go

var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries(
0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical histogram bucket definitions duplicated across packages

Low Severity

evmMillisecondBuckets in x/evm/keeper/metrics.go is byte-for-byte identical to millisecondBuckets in app/metrics.go. Duplicating bucket boundary arrays across packages creates a maintenance risk — if SLO thresholds change, both must be updated in lockstep, and a missed update leads to silently inconsistent histogram resolution. These could be consolidated into a shared metrics utility package.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 72e518f. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment thread x/evm/ante/fee.go
if !ctx.IsCheckTx() && !ctx.IsReCheckTx() {
metrics.HistogramEvmEffectiveGasPrice(gp)
utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified
evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gas price Uint64 truncation silently corrupts OTel histogram

Low Severity

evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) calls Uint64() on a *big.Int. If the effective gas price ever exceeds math.MaxUint64, Uint64() silently returns the low 64 bits, recording a completely wrong value in the new OTel histogram. While the legacy metric had the same flaw, the new OTel instrument is intended to be the long-term replacement and could use a lossless *big.Int-to-float64 conversion instead.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 72e518f. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@amir-deris amir-deris requested a review from masih May 12, 2026 22:32
@amir-deris amir-deris changed the base branch from main to amir/plt-323-migrate-app-metrics-to-otel-meter-api May 12, 2026 22:34
evaluation_interval: 15s

scrape_configs:
- job_name: 'cryptosim'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rollback these changes to existing prometheus and grafana yaml files and create new configs.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 96669ca. Configure here.

Comment thread x/evm/keeper/metrics.go
panic(err)
}
return v
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate generic helper functions across packages

Low Severity

mustMetric in x/evm/keeper/metrics.go and mustAnteMetric in x/evm/ante/metrics.go are functionally identical generic helper functions (both accept (V, error) and panic on error). This duplication increases maintenance burden — a shared utility (e.g. in utils/metrics) would avoid the redundancy.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 96669ca. Configure here.

Comment on lines +1 to +11
apiVersion: 1
providers:
- name: default
orgId: 1
folder: ""
type: file
disableDeletion: false
updateIntervalSeconds: 30
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm.

We're creating a mirror in the platform deployment, is that right?

@@ -0,0 +1,8 @@
apiVersion: 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this.

Comment thread x/evm/ante/fee.go
if !ctx.IsCheckTx() && !ctx.IsReCheckTx() {
metrics.HistogramEvmEffectiveGasPrice(gp)
utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified
evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment thread x/evm/ante/sig.go
// this nonce has already been mined, we cannot accept it again
metrics.IncrementPendingNonce("rejected")
utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified
evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance.

Should apply to all of these that have a known set of possible values ahead of time.

Comment thread x/evm/ante/sig.go
Comment on lines +131 to +133
cause := "too_low"
if tooHigh {
cause = "too_high"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools.

Comment thread x/evm/keeper/metrics.go

var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries(
0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment thread x/evm/ante/metrics.go
Comment on lines +26 to +30
func mustAnteMetric[V any](v V, err error) V {
if err != nil {
panic(err)
}
return v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse.

Comment thread Makefile
Comment on lines +321 to +338
# Start 4-node cluster with Prometheus and Grafana monitoring
docker-cluster-start-monitoring: docker-cluster-stop build-docker-node
@rm -rf $(PROJECT_HOME)/build/generated
@mkdir -p $(shell go env GOPATH)/pkg/mod
@mkdir -p $(shell go env GOCACHE)
@cd docker && \
if [ "$${DOCKER_DETACH:-}" = "true" ]; then \
DETACH_FLAG="-d"; \
else \
DETACH_FLAG=""; \
fi; \
DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
.PHONY: docker-cluster-start-monitoring

# Stop monitoring containers (Prometheus and Grafana) and cluster
docker-cluster-stop-monitoring:
@cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down
.PHONY: docker-cluster-stop-monitoring
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants