-
Notifications
You must be signed in to change notification settings - Fork 878
feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API #3423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amir/plt-323-migrate-app-metrics-to-otel-meter-api
Are you sure you want to change the base?
Changes from all commits
50b1fb4
68a5578
72e518f
96669ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -318,6 +318,25 @@ docker-cluster-stop: | |
| @cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose down | ||
| .PHONY: localnet-stop | ||
|
|
||
| # Start 4-node cluster with Prometheus and Grafana monitoring | ||
| docker-cluster-start-monitoring: docker-cluster-stop build-docker-node | ||
| @rm -rf $(PROJECT_HOME)/build/generated | ||
| @mkdir -p $(shell go env GOPATH)/pkg/mod | ||
| @mkdir -p $(shell go env GOCACHE) | ||
| @cd docker && \ | ||
| if [ "$${DOCKER_DETACH:-}" = "true" ]; then \ | ||
| DETACH_FLAG="-d"; \ | ||
| else \ | ||
| DETACH_FLAG=""; \ | ||
| fi; \ | ||
| DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG | ||
| .PHONY: docker-cluster-start-monitoring | ||
|
|
||
| # Stop monitoring containers (Prometheus and Grafana) and cluster | ||
| docker-cluster-stop-monitoring: | ||
| @cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down | ||
| .PHONY: docker-cluster-stop-monitoring | ||
|
Comment on lines
+321
to
+338
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes |
||
|
|
||
| # Run GIGA EVM integration tests with a GIGA-enabled cluster | ||
| # This starts a fresh cluster with GIGA_EXECUTOR and GIGA_OCC enabled, | ||
| # runs the EVM GIGA tests, then stops the cluster. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| services: | ||
| prometheus: | ||
| container_name: sei-prometheus | ||
| image: prom/prometheus:latest | ||
| ports: | ||
| - "9099:9090" | ||
| volumes: | ||
| - ./docker_compose_monitoring/prometheus.yaml:/etc/prometheus/prometheus.yml:ro | ||
| command: | ||
| - --config.file=/etc/prometheus/prometheus.yml | ||
| - --storage.tsdb.path=/prometheus | ||
| - --web.enable-lifecycle | ||
| networks: | ||
| - localnet | ||
|
|
||
| grafana: | ||
| container_name: sei-grafana | ||
| image: grafana/grafana:latest | ||
| ports: | ||
| - "3000:3000" | ||
| volumes: | ||
| - ./docker_compose_monitoring/grafana-datasource.yaml:/etc/grafana/provisioning/datasources/grafana-datasource.yaml:ro | ||
| - ./docker_compose_monitoring/grafana-dashboards.yaml:/etc/grafana/provisioning/dashboards/grafana-dashboards.yaml:ro | ||
| - ./monitornode/dashboards:/var/lib/grafana/dashboards:ro | ||
| environment: | ||
| - GF_SECURITY_ADMIN_USER=admin | ||
| - GF_SECURITY_ADMIN_PASSWORD=admin | ||
| - GF_USERS_ALLOW_SIGN_UP=false | ||
| depends_on: | ||
| - prometheus | ||
| networks: | ||
| - localnet | ||
|
|
||
| networks: | ||
| localnet: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| apiVersion: 1 | ||
| providers: | ||
| - name: default | ||
| orgId: 1 | ||
| folder: "" | ||
| type: file | ||
| disableDeletion: false | ||
| updateIntervalSeconds: 30 | ||
| options: | ||
| path: /var/lib/grafana/dashboards | ||
| foldersFromFilesStructure: false | ||
|
Comment on lines
+1
to
+11
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm. We're creating a mirror in the platform deployment, is that right? |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| apiVersion: 1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this. |
||
| datasources: | ||
| - name: Prometheus | ||
| type: prometheus | ||
| access: proxy | ||
| url: http://prometheus:9090 | ||
| isDefault: true | ||
| editable: false | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| global: | ||
| scrape_interval: 15s | ||
| evaluation_interval: 15s | ||
|
|
||
| scrape_configs: | ||
| - job_name: 'sei-localnet' | ||
| metrics_path: '/metrics' | ||
| params: | ||
| format: ['prometheus'] | ||
| static_configs: | ||
| - targets: | ||
| - 'sei-node-0:1317' | ||
| - 'sei-node-1:1317' | ||
| - 'sei-node-2:1317' | ||
| - 'sei-node-3:1317' | ||
| scrape_interval: 5s |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,7 +12,7 @@ import ( | |
| sdkerrors "github.com/sei-protocol/sei-chain/sei-cosmos/types/errors" | ||
| upgradekeeper "github.com/sei-protocol/sei-chain/sei-cosmos/x/upgrade/keeper" | ||
| "github.com/sei-protocol/sei-chain/utils" | ||
| "github.com/sei-protocol/sei-chain/utils/metrics" | ||
| utilmetrics "github.com/sei-protocol/sei-chain/utils/metrics" | ||
| "github.com/sei-protocol/sei-chain/x/evm/derived" | ||
| evmkeeper "github.com/sei-protocol/sei-chain/x/evm/keeper" | ||
| "github.com/sei-protocol/sei-chain/x/evm/state" | ||
|
|
@@ -127,7 +127,8 @@ func (fc EVMFeeCheckDecorator) getMinimumFee(ctx sdk.Context) *big.Int { | |
| func (fc EVMFeeCheckDecorator) CalculatePriority(ctx sdk.Context, txData ethtx.TxData) *big.Int { | ||
| gp := txData.EffectiveGasPrice(utils.Big0) | ||
| if !ctx.IsCheckTx() && !ctx.IsReCheckTx() { | ||
| metrics.HistogramEvmEffectiveGasPrice(gp) | ||
| utilmetrics.HistogramEvmEffectiveGasPrice(gp) // TODO(PLT-330): remove once evm_effective_gas_price verified | ||
| evmAnteMetrics.effectiveGasPrice.Record(ctx.Context(), float64(gp.Uint64())) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gas price Uint64 truncation silently corrupts OTel histogramLow Severity
Reviewed by Cursor Bugbot for commit 72e518f. Configure here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏻 |
||
| } | ||
| priority := sdk.NewDecFromBigInt(gp).Quo(fc.evmKeeper.GetPriorityNormalizer(ctx)).TruncateInt().BigInt() | ||
| if priority.Cmp(big.NewInt(antedecorators.MaxPriority)) > 0 { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| package ante | ||
|
|
||
| import ( | ||
| "sync" | ||
|
|
||
| "go.opentelemetry.io/otel" | ||
| "go.opentelemetry.io/otel/metric" | ||
| ) | ||
|
|
||
| type evmAnteMetricsType struct { | ||
| once sync.Once | ||
|
|
||
| // Nonce tracking | ||
| pendingNonce metric.Int64Counter | ||
| nonceMismatch metric.Int64Counter | ||
|
|
||
| // Gas price histogram | ||
| effectiveGasPrice metric.Float64Histogram | ||
|
|
||
| // Association errors | ||
| associationError metric.Int64Counter | ||
| } | ||
|
|
||
| var evmAnteMetrics evmAnteMetricsType | ||
|
|
||
| func mustAnteMetric[V any](v V, err error) V { | ||
| if err != nil { | ||
| panic(err) | ||
| } | ||
| return v | ||
|
Comment on lines
+26
to
+30
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse. |
||
| } | ||
|
|
||
| // InitEvmAnteMetrics registers all OTel instruments for the x/evm ante package. | ||
| // Safe to call concurrently; instruments are registered exactly once. | ||
| func InitEvmAnteMetrics() { | ||
| evmAnteMetrics.once.Do(func() { | ||
| meter := otel.Meter("evm_ante") | ||
|
|
||
| evmAnteMetrics.pendingNonce = mustAnteMetric(meter.Int64Counter( | ||
| "evm_pending_nonce_total", | ||
| metric.WithDescription("EVM pending nonce events by type (added, expired, rejected, accepted)"), | ||
| )) | ||
|
|
||
| evmAnteMetrics.nonceMismatch = mustAnteMetric(meter.Int64Counter( | ||
| "evm_nonce_mismatch_total", | ||
| metric.WithDescription("EVM nonce mismatches by cause (too_high, too_low)"), | ||
| )) | ||
|
|
||
| evmAnteMetrics.effectiveGasPrice = mustAnteMetric(meter.Float64Histogram( | ||
| "evm_effective_gas_price", | ||
| metric.WithDescription("Effective gas price for EVM transactions"), | ||
| )) | ||
|
|
||
| evmAnteMetrics.associationError = mustAnteMetric(meter.Int64Counter( | ||
| "evm_ante_association_error_total", | ||
| metric.WithDescription("EVM address association errors by scenario and address type"), | ||
| )) | ||
| }) | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| package ante | ||
|
|
||
| import ( | ||
| "context" | ||
| "testing" | ||
|
|
||
| "go.opentelemetry.io/otel" | ||
| "go.opentelemetry.io/otel/attribute" | ||
| otelmetric "go.opentelemetry.io/otel/metric" | ||
| "go.opentelemetry.io/otel/metric/noop" | ||
| ) | ||
|
|
||
| func TestInitEvmAnteMetricsNoPanic(t *testing.T) { | ||
| otel.SetMeterProvider(noop.NewMeterProvider()) | ||
| InitEvmAnteMetrics() | ||
| } | ||
|
|
||
| func TestEvmAnteMetricsAllInstrumentsUsable(t *testing.T) { | ||
| otel.SetMeterProvider(noop.NewMeterProvider()) | ||
| InitEvmAnteMetrics() | ||
|
|
||
| ctx := context.Background() | ||
|
|
||
| for _, event := range []string{"added", "expired", "rejected", "accepted"} { | ||
| evmAnteMetrics.pendingNonce.Add(ctx, 1, otelmetric.WithAttributes(attribute.String("event", event))) | ||
| } | ||
| for _, cause := range []string{"too_high", "too_low"} { | ||
| evmAnteMetrics.nonceMismatch.Add(ctx, 1, otelmetric.WithAttributes(attribute.String("cause", cause))) | ||
| } | ||
| evmAnteMetrics.effectiveGasPrice.Record(ctx, 1e9) | ||
| evmAnteMetrics.associationError.Add(ctx, 1, otelmetric.WithAttributes(attribute.String("scenario", "associate_tx_insufficient_funds"), attribute.String("type", "sei"))) | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,7 +10,10 @@ import ( | |
| tmtypes "github.com/sei-protocol/sei-chain/sei-tendermint/types" | ||
| "github.com/sei-protocol/seilog" | ||
|
|
||
| "github.com/sei-protocol/sei-chain/utils/metrics" | ||
| "go.opentelemetry.io/otel/attribute" | ||
| otelmetric "go.opentelemetry.io/otel/metric" | ||
|
|
||
| utilmetrics "github.com/sei-protocol/sei-chain/utils/metrics" | ||
| evmkeeper "github.com/sei-protocol/sei-chain/x/evm/keeper" | ||
| "github.com/sei-protocol/sei-chain/x/evm/types" | ||
| ) | ||
|
|
@@ -74,14 +77,16 @@ func (svd *EVMSigVerifyDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulat | |
| ctx = ctx.WithCheckTxCallback(func(priority int64) { | ||
| txHash := tmtypes.Tx(ctx.TxBytes()).Hash() | ||
| svd.evmKeeper.AddPendingNonce(txHash, evmAddr, txNonce, priority) | ||
| metrics.IncrementPendingNonce("added") | ||
| utilmetrics.IncrementPendingNonce("added") // TODO(PLT-330): remove once evm_pending_nonce_total verified | ||
| evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "added"))) | ||
| }) | ||
|
|
||
| // if the mempool expires a transaction, this handler is invoked | ||
| ctx = ctx.WithExpireTxHandler(func() { | ||
| txHash := tmtypes.Tx(ctx.TxBytes()).Hash() | ||
| svd.evmKeeper.RemovePendingNonce(txHash) | ||
| metrics.IncrementPendingNonce("expired") | ||
| utilmetrics.IncrementPendingNonce("expired") // TODO(PLT-330): remove once evm_pending_nonce_total verified | ||
| evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "expired"))) | ||
| }) | ||
|
|
||
| if txNonce > nextNonce { | ||
|
|
@@ -100,7 +105,8 @@ func (svd *EVMSigVerifyDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulat | |
|
|
||
| if txNonce < nextNonceToBeMined { | ||
| // this nonce has already been mined, we cannot accept it again | ||
| metrics.IncrementPendingNonce("rejected") | ||
| utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified | ||
| evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected"))) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance. Should apply to all of these that have a known set of possible values ahead of time. |
||
| return abci.Rejected | ||
| } else if txNonce < nextPendingNonce { | ||
| // check if the sender still has enough funds to pay for gas | ||
|
|
@@ -112,14 +118,21 @@ func (svd *EVMSigVerifyDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulat | |
| // this nonce is allowed to process as it is part of the | ||
| // consecutive nonces from nextNonceToBeMined to nextPendingNonce | ||
| // This logic allows multiple nonces from an account to be processed in a block. | ||
| metrics.IncrementPendingNonce("accepted") | ||
| utilmetrics.IncrementPendingNonce("accepted") // TODO(PLT-330): remove once evm_pending_nonce_total verified | ||
| evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "accepted"))) | ||
| return abci.Accepted | ||
| } | ||
| return abci.Pending | ||
| }) | ||
| } | ||
| } else if txNonce != nextNonce { | ||
| metrics.IncrementNonceMismatch(txNonce > nextNonce) | ||
| tooHigh := txNonce > nextNonce | ||
| utilmetrics.IncrementNonceMismatch(tooHigh) // TODO(PLT-330): remove once evm_nonce_mismatch_total verified | ||
| cause := "too_low" | ||
| if tooHigh { | ||
| cause = "too_high" | ||
|
Comment on lines
+131
to
+133
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools. |
||
| } | ||
| evmAnteMetrics.nonceMismatch.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("cause", cause))) | ||
| return ctx, sdkerrors.ErrWrongSequence | ||
| } | ||
|
|
||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with
docker logs sei-grafanaanddocker logs sei-prometheus