feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API by amir-deris · Pull Request #3423 · sei-protocol/sei-chain

amir-deris · 2026-05-12T21:05:03Z

Summary

Migrates telemetry in the x/evm/keeper and x/evm/ante packages from the legacy telemetry/utilmetrics helpers to the standardized OpenTelemetry Meter API, following the same pattern established in #3396 for app and app/ante.

Adds x/evm/keeper/metrics.go with a struct-based OTel instrument set registered via otel.Meter("evm_keeper"), initialized once via InitEvmKeeperMetrics() called from app.go after SetupOtelMetricsProvider()
Adds x/evm/ante/metrics.go with an evmAnteMetrics struct via otel.Meter("evm_ante"), initialized once via InitEvmAnteMetrics()
Replaces telemetry.ModuleMeasureSince / utilmetrics / seimetrics.SafeTelemetry* calls in all target files with dual-emitted OTel instruments; legacy calls remain with TODO(PLT-330) markers until dashboards are migrated
Threads ctx.Context() / goCtx through all .Record() and .Add() call sites
Disambiguates the evm_association_error_total counter that was registered by both keeper and ante into evm_keeper_association_error_total and evm_ante_association_error_total

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

ABCI phase durations — histograms bucketed at fine-grained ms thresholds:

evm_abci_begin_blocker_duration_seconds
evm_abci_end_blocker_duration_seconds

Block fee:

evm_block_base_fee — synchronous gauge set each EndBlock

EVM transaction counters:

evm_panics_total
evm_errors_total — by type label (state_transition, stateDB_finalize, write_receipt, apply_message, vm_execution)
evm_receipt_status_total — by status label (success, failed)

Association errors:

evm_keeper_association_error_total — by scenario and type labels (emitted from keeper)
evm_ante_association_error_total — by scenario and type labels (emitted from ante)

Zero-storage cleanup:

evm_zero_storage_processed_keys_total
evm_zero_storage_pruned_keys_total
evm_zero_storage_pruned_bytes_total

Ante nonce/fee tracking:

evm_pending_nonce_total — by event label (added, expired, rejected, accepted)
evm_nonce_mismatch_total — by cause label (too_high, too_low)
evm_effective_gas_price — histogram of effective gas price per EVM transaction

Local observability tooling

Adds docker/docker-compose.monitoring.yml — a compose overlay that brings up sei-prometheus and sei-grafana on the existing localnet network alongside the four validator nodes. Prometheus scrapes the nodes directly by container hostname (port 1317) since it shares the network; Grafana is pre-provisioned with Prometheus as a data source and the existing dashboards directory.

Two new Makefile targets:

make docker-cluster-start-monitoring — builds and starts all six containers; Prometheus/Grafana logs are suppressed from the terminal output (--no-attach) but remain accessible via docker logs
make docker-cluster-stop-monitoring — tears everything down together via docker compose down

Note

Medium Risk
Touches EVM transaction/ABCI hot paths to add dual-emitted OpenTelemetry metrics alongside existing telemetry helpers; while intended to be non-functional, any context/label or timing changes could impact performance or panic behavior. Also adds optional localnet Prometheus/Grafana compose overlay and Makefile targets, which is low risk.

Overview
Migrates x/evm keeper and ante telemetry to OpenTelemetry Meter instruments, adding new metrics.go registrations plus tests, and wiring InitEvmKeeperMetrics() / InitEvmAnteMetrics() into app/app.go startup.

Updates EVM ante/keeper code paths to record OTel counters/histograms/gauges for pending nonce events, nonce mismatches, effective gas price, association errors, ABCI BeginBlock/EndBlock durations, base fee, tx processing errors/panics/receipt status, and zero-storage cleanup, while keeping legacy utilmetrics/seimetrics emissions with TODO(PLT-330) notes for transition.

Adds local observability tooling via docker-compose.monitoring.yml (Prometheus + Grafana provisioning files) and new Makefile targets docker-cluster-start-monitoring / docker-cluster-stop-monitoring to run the 4-node localnet with monitoring enabled.

^{Reviewed by Cursor Bugbot for commit 96669ca. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-12T21:06:10Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	May 12, 2026, 10:47 PM

codecov · 2026-05-12T21:08:53Z

Codecov Report

❌ Patch coverage is 90.97744% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.39%. Comparing base (f2384dd) to head (96669ca).

Files with missing lines	Patch %	Lines
x/evm/keeper/msg_server.go	62.50%	6 Missing ⚠️
x/evm/ante/metrics.go	91.66%	1 Missing and 1 partial ⚠️
x/evm/ante/sig.go	85.71%	2 Missing ⚠️
x/evm/keeper/metrics.go	96.15%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@                                  Coverage Diff                                   @@
##           amir/plt-323-migrate-app-metrics-to-otel-meter-api    #3423      +/-   ##
======================================================================================
+ Coverage                                               59.37%   59.39%   +0.02%     
======================================================================================
  Files                                                    2114     2115       +1     
  Lines                                                  174919   174983      +64     
======================================================================================
+ Hits                                                   103859   103937      +78     
+ Misses                                                  62019    62004      -15     
- Partials                                                 9041     9042       +1

Flag	Coverage Δ
sei-chain-pr	`61.63% <90.97%> (+10.82%)`	⬆️
sei-db	`70.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
app/app.go	`68.19% <100.00%> (+0.03%)`	⬆️
x/evm/ante/fee.go	`73.43% <100.00%> (+0.42%)`	⬆️
x/evm/ante/preprocess.go	`80.23% <100.00%> (+0.23%)`	⬆️
x/evm/keeper/abci.go	`58.97% <100.00%> (+3.41%)`	⬆️
x/evm/keeper/evm.go	`74.81% <100.00%> (+0.18%)`	⬆️
x/evm/keeper/storage_cleanup.go	`89.39% <100.00%> (+0.50%)`	⬆️
x/evm/ante/metrics.go	`91.66% <91.66%> (ø)`
x/evm/ante/sig.go	`82.35% <85.71%> (+0.99%)`	⬆️
x/evm/keeper/metrics.go	`96.15% <96.15%> (ø)`
x/evm/keeper/msg_server.go	`80.36% <62.50%> (+1.94%)`	⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ainers with docker compose

amir-deris · 2026-05-12T22:31:44Z

+		else \
+			DETACH_FLAG=""; \
+		fi; \
+		DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG


Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with docker logs sei-grafana and docker logs sei-prometheus

cursor · 2026-05-12T22:32:18Z

+
+var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries(
+	0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10,
+)


Identical histogram bucket definitions duplicated across packages

Low Severity

evmMillisecondBuckets in x/evm/keeper/metrics.go is byte-for-byte identical to millisecondBuckets in app/metrics.go. Duplicating bucket boundary arrays across packages creates a maintenance risk — if SLO thresholds change, both must be updated in lockstep, and a missed update leads to silently inconsistent histogram resolution. These could be consolidated into a shared metrics utility package.

Additional Locations (1)

app/metrics.go#L21-L26

^{Reviewed by Cursor Bugbot for commit 72e518f. Configure here.}

cursor · 2026-05-12T22:32:18Z

 	if !ctx.IsCheckTx() && !ctx.IsReCheckTx() {
-		metrics.HistogramEvmEffectiveGasPrice(gp)
+		utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified
+		evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64()))


Gas price Uint64 truncation silently corrupts OTel histogram

Low Severity

evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) calls Uint64() on a *big.Int. If the effective gas price ever exceeds math.MaxUint64, Uint64() silently returns the low 64 bits, recording a completely wrong value in the new OTel histogram. While the legacy metric had the same flaw, the new OTel instrument is intended to be the long-term replacement and could use a lossless *big.Int-to-float64 conversion instead.

^{Reviewed by Cursor Bugbot for commit 72e518f. Configure here.}

amir-deris · 2026-05-12T22:39:44Z

  evaluation_interval: 15s

 scrape_configs:
-  - job_name: 'cryptosim'


I will rollback these changes to existing prometheus and grafana yaml files and create new configs.

…ting ones

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 96669ca. Configure here.}

cursor · 2026-05-12T22:51:43Z

+		panic(err)
+	}
+	return v
+}


Duplicate generic helper functions across packages

Low Severity

mustMetric in x/evm/keeper/metrics.go and mustAnteMetric in x/evm/ante/metrics.go are functionally identical generic helper functions (both accept (V, error) and panic on error). This duplication increases maintenance burden — a shared utility (e.g. in utils/metrics) would avoid the redundancy.

Additional Locations (1)

x/evm/ante/metrics.go#L25-L31

^{Reviewed by Cursor Bugbot for commit 96669ca. Configure here.}

bdchatham · 2026-05-13T00:46:10Z

+apiVersion: 1
+providers:
+  - name: default
+    orgId: 1
+    folder: ""
+    type: file
+    disableDeletion: false
+    updateIntervalSeconds: 30
+    options:
+      path: /var/lib/grafana/dashboards
+      foldersFromFilesStructure: false


I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm.

We're creating a mirror in the platform deployment, is that right?

bdchatham · 2026-05-13T00:48:25Z

@@ -0,0 +1,8 @@
+apiVersion: 1


Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this.

bdchatham · 2026-05-13T00:55:31Z

 	if !ctx.IsCheckTx() && !ctx.IsReCheckTx() {
-		metrics.HistogramEvmEffectiveGasPrice(gp)
+		utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified
+		evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64()))


bdchatham · 2026-05-13T01:00:14Z

 					// this nonce has already been mined, we cannot accept it again
-					metrics.IncrementPendingNonce("rejected")
+					utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified
+					evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected")))


nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance.

Should apply to all of these that have a known set of possible values ahead of time.

bdchatham · 2026-05-13T01:02:05Z

+		cause := "too_low"
+		if tooHigh {
+			cause = "too_high"


nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools.

bdchatham · 2026-05-13T01:04:06Z

+
+var evmMillisecondBuckets = metric.WithExplicitBucketBoundaries(
+	0.000025, 0.000050, 0.0001, 0.0005, 0.001, 0.0025, 0.005, 0.010, 0.020, 0.050, 0.075, 0.1, 0.25, 0.5, 1, 10,
+)


bdchatham · 2026-05-13T01:05:03Z

+func mustAnteMetric[V any](v V, err error) V {
+	if err != nil {
+		panic(err)
+	}
+	return v


nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse.

bdchatham · 2026-05-13T01:07:27Z

+# Start 4-node cluster with Prometheus and Grafana monitoring
+docker-cluster-start-monitoring: docker-cluster-stop build-docker-node
+	@rm -rf $(PROJECT_HOME)/build/generated
+	@mkdir -p $(shell go env GOPATH)/pkg/mod
+	@mkdir -p $(shell go env GOCACHE)
+	@cd docker && \
+		if [ "$${DOCKER_DETACH:-}" = "true" ]; then \
+			DETACH_FLAG="-d"; \
+		else \
+			DETACH_FLAG=""; \
+		fi; \
+		DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
+.PHONY: docker-cluster-start-monitoring
+
+# Stop monitoring containers (Prometheus and Grafana) and cluster
+docker-cluster-stop-monitoring:
+	@cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down
+.PHONY: docker-cluster-stop-monitoring


Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes

Added otel metrics for x/evm package

50b1fb4

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go

amir-deris changed the title ~~Amir/plt 329 migrate x evm metrics to otel~~ feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API May 12, 2026

amir-deris self-assigned this May 12, 2026

amir-deris added the non-app-hash-breaking label May 12, 2026

amir-deris added 2 commits May 12, 2026 15:21

Added new makefile target for bringing up grafana and prometheus cont…

68a5578

…ainers with docker compose

Made metric names unique

72e518f

amir-deris commented May 12, 2026

View reviewed changes

amir-deris requested a review from bdchatham May 12, 2026 22:32

cursor Bot reviewed May 12, 2026

View reviewed changes

amir-deris requested a review from masih May 12, 2026 22:32

amir-deris changed the base branch from main to amir/plt-323-migrate-app-metrics-to-otel-meter-api May 12, 2026 22:34

amir-deris commented May 12, 2026

View reviewed changes

Used new configs for grafana and prometheus instead of modifying exis…

96669ca

…ting ones

cursor Bot reviewed May 12, 2026

View reviewed changes

bdchatham reviewed May 13, 2026

View reviewed changes

Conversation

amir-deris commented May 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

Local observability tooling

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Identical histogram bucket definitions duplicated across packages

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Gas price Uint64 truncation silently corrupts OTel histogram

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Duplicate generic helper functions across packages

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amir-deris commented May 12, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading