feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API#3423
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## amir/plt-323-migrate-app-metrics-to-otel-meter-api #3423 +/- ##
======================================================================================
+ Coverage 59.37% 59.39% +0.02%
======================================================================================
Files 2114 2115 +1
Lines 174919 174983 +64
======================================================================================
+ Hits 103859 103937 +78
+ Misses 62019 62004 -15
- Partials 9041 9042 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ainers with docker compose
| else \ | ||
| DETACH_FLAG=""; \ | ||
| fi; \ | ||
| DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG |
There was a problem hiding this comment.
Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with docker logs sei-grafana and docker logs sei-prometheus
|
|
||
| var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries( | ||
| 0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10, | ||
| ) |
There was a problem hiding this comment.
Identical histogram bucket definitions duplicated across packages
Low Severity
evmMillisecondBuckets in x/evm/keeper/metrics.go is byte-for-byte identical to millisecondBuckets in app/metrics.go. Duplicating bucket boundary arrays across packages creates a maintenance risk — if SLO thresholds change, both must be updated in lockstep, and a missed update leads to silently inconsistent histogram resolution. These could be consolidated into a shared metrics utility package.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 72e518f. Configure here.
| if !ctx.IsCheckTx() && !ctx.IsReCheckTx() { | ||
| metrics.HistogramEvmEffectiveGasPrice(gp) | ||
| utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified | ||
| evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) |
There was a problem hiding this comment.
Gas price Uint64 truncation silently corrupts OTel histogram
Low Severity
evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) calls Uint64() on a *big.Int. If the effective gas price ever exceeds math.MaxUint64, Uint64() silently returns the low 64 bits, recording a completely wrong value in the new OTel histogram. While the legacy metric had the same flaw, the new OTel instrument is intended to be the long-term replacement and could use a lossless *big.Int-to-float64 conversion instead.
Reviewed by Cursor Bugbot for commit 72e518f. Configure here.
| evaluation_interval: 15s | ||
|
|
||
| scrape_configs: | ||
| - job_name: 'cryptosim' |
There was a problem hiding this comment.
I will rollback these changes to existing prometheus and grafana yaml files and create new configs.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 96669ca. Configure here.
| panic(err) | ||
| } | ||
| return v | ||
| } |
There was a problem hiding this comment.
Duplicate generic helper functions across packages
Low Severity
mustMetric in x/evm/keeper/metrics.go and mustAnteMetric in x/evm/ante/metrics.go are functionally identical generic helper functions (both accept (V, error) and panic on error). This duplication increases maintenance burden — a shared utility (e.g. in utils/metrics) would avoid the redundancy.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 96669ca. Configure here.
| apiVersion: 1 | ||
| providers: | ||
| - name: default | ||
| orgId: 1 | ||
| folder: "" | ||
| type: file | ||
| disableDeletion: false | ||
| updateIntervalSeconds: 30 | ||
| options: | ||
| path: /var/lib/grafana/dashboards | ||
| foldersFromFilesStructure: false |
There was a problem hiding this comment.
I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm.
We're creating a mirror in the platform deployment, is that right?
| @@ -0,0 +1,8 @@ | |||
| apiVersion: 1 | |||
There was a problem hiding this comment.
Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this.
| if !ctx.IsCheckTx() && !ctx.IsReCheckTx() { | ||
| metrics.HistogramEvmEffectiveGasPrice(gp) | ||
| utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified | ||
| evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) |
| // this nonce has already been mined, we cannot accept it again | ||
| metrics.IncrementPendingNonce("rejected") | ||
| utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified | ||
| evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected"))) |
There was a problem hiding this comment.
nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance.
Should apply to all of these that have a known set of possible values ahead of time.
| cause := "too_low" | ||
| if tooHigh { | ||
| cause = "too_high" |
There was a problem hiding this comment.
nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools.
|
|
||
| var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries( | ||
| 0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10, | ||
| ) |
| func mustAnteMetric[V any](v V, err error) V { | ||
| if err != nil { | ||
| panic(err) | ||
| } | ||
| return v |
There was a problem hiding this comment.
nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse.
| # Start 4-node cluster with Prometheus and Grafana monitoring | ||
| docker-cluster-start-monitoring: docker-cluster-stop build-docker-node | ||
| @rm -rf $(PROJECT_HOME)/build/generated | ||
| @mkdir -p $(shell go env GOPATH)/pkg/mod | ||
| @mkdir -p $(shell go env GOCACHE) | ||
| @cd docker && \ | ||
| if [ "$${DOCKER_DETACH:-}" = "true" ]; then \ | ||
| DETACH_FLAG="-d"; \ | ||
| else \ | ||
| DETACH_FLAG=""; \ | ||
| fi; \ | ||
| DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG | ||
| .PHONY: docker-cluster-start-monitoring | ||
|
|
||
| # Stop monitoring containers (Prometheus and Grafana) and cluster | ||
| docker-cluster-stop-monitoring: | ||
| @cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down | ||
| .PHONY: docker-cluster-stop-monitoring |
There was a problem hiding this comment.
Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes


Summary
Migrates telemetry in the
x/evm/keeperandx/evm/antepackages from the legacytelemetry/utilmetricshelpers to the standardized OpenTelemetry Meter API, following the same pattern established in #3396 forappandapp/ante.x/evm/keeper/metrics.gowith a struct-based OTel instrument set registered viaotel.Meter("evm_keeper"), initialized once viaInitEvmKeeperMetrics()called fromapp.goafterSetupOtelMetricsProvider()x/evm/ante/metrics.gowith anevmAnteMetricsstruct viaotel.Meter("evm_ante"), initialized once viaInitEvmAnteMetrics()telemetry.ModuleMeasureSince/utilmetrics/seimetrics.SafeTelemetry*calls in all target files with dual-emitted OTel instruments; legacy calls remain withTODO(PLT-330)markers until dashboards are migratedctx.Context()/goCtxthrough all.Record()and.Add()call sitesevm_association_error_totalcounter that was registered by both keeper and ante intoevm_keeper_association_error_totalandevm_ante_association_error_totalNew metrics (OTel naming convention, exported via the process-wide MeterProvider)
ABCI phase durations — histograms bucketed at fine-grained ms thresholds:
evm_abci_begin_blocker_duration_secondsevm_abci_end_blocker_duration_secondsBlock fee:
evm_block_base_fee— synchronous gauge set each EndBlockEVM transaction counters:
evm_panics_totalevm_errors_total— bytypelabel (state_transition,stateDB_finalize,write_receipt,apply_message,vm_execution)evm_receipt_status_total— bystatuslabel (success,failed)Association errors:
evm_keeper_association_error_total— byscenarioandtypelabels (emitted from keeper)evm_ante_association_error_total— byscenarioandtypelabels (emitted from ante)Zero-storage cleanup:
evm_zero_storage_processed_keys_totalevm_zero_storage_pruned_keys_totalevm_zero_storage_pruned_bytes_totalAnte nonce/fee tracking:
evm_pending_nonce_total— byeventlabel (added,expired,rejected,accepted)evm_nonce_mismatch_total— bycauselabel (too_high,too_low)evm_effective_gas_price— histogram of effective gas price per EVM transactionLocal observability tooling
Adds
docker/docker-compose.monitoring.yml— a compose overlay that brings upsei-prometheusandsei-grafanaon the existinglocalnetnetwork alongside the four validator nodes. Prometheus scrapes the nodes directly by container hostname (port 1317) since it shares the network; Grafana is pre-provisioned with Prometheus as a data source and the existing dashboards directory.Two new Makefile targets:
make docker-cluster-start-monitoring— builds and starts all six containers; Prometheus/Grafana logs are suppressed from the terminal output (--no-attach) but remain accessible viadocker logsmake docker-cluster-stop-monitoring— tears everything down together viadocker compose downNote
Medium Risk
Touches EVM transaction/ABCI hot paths to add dual-emitted OpenTelemetry metrics alongside existing telemetry helpers; while intended to be non-functional, any context/label or timing changes could impact performance or panic behavior. Also adds optional localnet Prometheus/Grafana compose overlay and Makefile targets, which is low risk.
Overview
Migrates
x/evmkeeper and ante telemetry to OpenTelemetry Meter instruments, adding newmetrics.goregistrations plus tests, and wiringInitEvmKeeperMetrics()/InitEvmAnteMetrics()intoapp/app.gostartup.Updates EVM ante/keeper code paths to record OTel counters/histograms/gauges for pending nonce events, nonce mismatches, effective gas price, association errors, ABCI BeginBlock/EndBlock durations, base fee, tx processing errors/panics/receipt status, and zero-storage cleanup, while keeping legacy
utilmetrics/seimetricsemissions withTODO(PLT-330)notes for transition.Adds local observability tooling via
docker-compose.monitoring.yml(Prometheus + Grafana provisioning files) and new Makefile targetsdocker-cluster-start-monitoring/docker-cluster-stop-monitoringto run the 4-node localnet with monitoring enabled.Reviewed by Cursor Bugbot for commit 96669ca. Bugbot is set up for automated code reviews on this repo. Configure here.