perf(ci): stabilize benchmark results with repetition and interleaving#384
Merged
zoowii merged 4 commits intoDTVMStack:mainfrom Mar 6, 2026
Merged
perf(ci): stabilize benchmark results with repetition and interleaving#384zoowii merged 4 commits intoDTVMStack:mainfrom
zoowii merged 4 commits intoDTVMStack:mainfrom
Conversation
48288d1 to
673383e
Compare
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
Tune the underlying Google Benchmark parameters in CI to reduce test variance in shared GitHub Actions environments: - Expose --benchmark-min-time to check_performance_regression.py. - Pass `--benchmark_enable_random_interleaving=true` when repetitions > 1. - Provide `BENCHMARK_REPETITIONS` and `BENCHMARK_MIN_TIME` in CI env. - Increase benchmark repetitions to 5 and min-time to 2s in workflows. These changes smooth out short-term CPU spikes and filter out noise caused by intermittent CI noisy neighbors. Made-with: Cursor
673383e to
2481bfb
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR stabilizes CI performance benchmark results by exposing additional Google Benchmark controls and setting more noise-resistant defaults in the x86 workflow.
Changes:
- Add
--benchmark-min-timesupport totools/check_performance_regression.pyand forward it toevmone-bench. - When repetitions are enabled, also enable random interleaving to reduce ordering bias.
- Configure CI to run benchmarks with higher repetitions and a minimum runtime via env vars passed through
.ci/run_test_suite.sh.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tools/check_performance_regression.py |
Adds --benchmark-min-time passthrough and enables random interleaving when using repetitions. |
.github/workflows/dtvm_evm_test_x86.yml |
Sets CI env defaults (BENCHMARK_REPETITIONS=5, BENCHMARK_MIN_TIME=2s) to reduce variance. |
.ci/run_test_suite.sh |
Plumbs benchmark env vars into check_performance_regression.py invocations via PERF_ARGS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove BENCHMARK_MIN_TIME=2s to avoid CI slowdown - Add setarch x86_64 -R to disable ASLR for evmone-bench, eliminating unreproducible noise from address randomization Made-with: Cursor
Increase BENCHMARK_THRESHOLD from 20% to 25% to reduce false positives on GitHub Actions shared runners. This PR only changes CI config, not runtime code, so regression detection was overly sensitive. Made-with: Cursor
setarch x86_64 -R fails with 'Operation not permitted' in the CI container (personality changes require privileges not granted). Revert to plain bash; repetitions and random interleaving remain for stability. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
.github/workflows/dtvm_evm_test_x86.yml,tools/check_performance_regression.py,.ci/run_test_suite.sh3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
Enable more stable performance tests in CI.
Fix:
Tune the underlying Google Benchmark parameters and runtime environment in CI to reduce test variance in shared GitHub Actions environments:
--benchmark-min-timetocheck_performance_regression.py(optional, not set in CI).--benchmark_enable_random_interleaving=truewhen repetitions > 1.BENCHMARK_REPETITIONSin CI env; increase repetitions to 5.setarch x86_64 -Rto disable ASLR, eliminating unreproducible noise from address randomization.These changes smooth out short-term CPU spikes and filter out noise without extending CI duration (min-time removed).
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
The CI workflow changes can be verified by observing the PR checks.
6. Release note
None