perf(ci): disable ASLR for benchmark to reduce result variance#385
perf(ci): disable ASLR for benchmark to reduce result variance#385starwarfan wants to merge 2 commits intoDTVMStack:mainfrom
Conversation
Tune the underlying Google Benchmark parameters in CI to reduce test variance in shared GitHub Actions environments: - Expose --benchmark-min-time to check_performance_regression.py. - Pass `--benchmark_enable_random_interleaving=true` when repetitions > 1. - Provide `BENCHMARK_REPETITIONS` and `BENCHMARK_MIN_TIME` in CI env. - Increase benchmark repetitions to 5 and min-time to 2s in workflows. These changes smooth out short-term CPU spikes and filter out noise caused by intermittent CI noisy neighbors. Made-with: Cursor
Use setarch x86_64 -R to run evmone-bench with ASLR disabled, eliminating the 'ASLR is enabled' warning and reducing unreproducible noise in performance regression results. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR reduces benchmark result variance in CI by applying several noise-reduction techniques: disabling ASLR via setarch x86_64 -R, enabling random interleaving of benchmark repetitions, increasing the number of repetitions from 3 to 5, and setting a minimum benchmark execution time of 2 seconds.
Changes:
- Wraps the benchmark CI step with
setarch x86_64 -Rto disable ASLR, and setsBENCHMARK_REPETITIONS=5andBENCHMARK_MIN_TIME=2sas environment variables - Adds
--benchmark-min-timeCLI argument tocheck_performance_regression.pyand enables--benchmark_enable_random_interleaving=truewhen repetitions > 1 - Plumbs the new
BENCHMARK_REPETITIONSandBENCHMARK_MIN_TIMEenvironment variables throughrun_test_suite.shto the Python benchmark script
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
.github/workflows/dtvm_evm_test_x86.yml |
Sets BENCHMARK_REPETITIONS=5, BENCHMARK_MIN_TIME=2s, and wraps benchmark run with setarch x86_64 -R |
.ci/run_test_suite.sh |
Reads new env vars, constructs PERF_ARGS, and passes them to all check_performance_regression.py invocations |
tools/check_performance_regression.py |
Adds --benchmark-min-time argument, passes it to Google Benchmark, and adds --benchmark_enable_random_interleaving=true |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export BENCHMARK_REPETITIONS=5 | ||
| export BENCHMARK_MIN_TIME=2s | ||
|
|
||
| bash .ci/run_test_suite.sh | ||
| setarch x86_64 -R bash .ci/run_test_suite.sh |
There was a problem hiding this comment.
The PR description only mentions disabling ASLR via setarch x86_64 -R, but the actual changes are broader:
- Adds
--benchmark_enable_random_interleaving=trueto the benchmark command - Adds a new
--benchmark-min-timeCLI argument and plumbing - Increases
BENCHMARK_REPETITIONSfrom 3 to 5 - Sets
BENCHMARK_MIN_TIME=2s
These are all reasonable variance-reduction measures, but the description should be updated to reflect the full scope of changes so reviewers and future readers understand the intent.
| @@ -208,15 +218,15 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do | |||
| echo "Running baseline benchmarks with library from base branch..." | |||
| cp "$BENCHMARK_BASELINE_LIB"/libdtvmapi.so ./libdtvmapi.so | |||
| SAVE_PATH=${BASELINE_CACHE:-/tmp/perf_baseline.json} | |||
| python3 check_performance_regression.py \ | |||
| python3 check_performance_regression.py $PERF_ARGS \ | |||
| --save-baseline "$SAVE_PATH" \ | |||
| --lib ./libdtvmapi.so \ | |||
| --mode "$BENCHMARK_MODE" \ | |||
| --benchmark-dir test/evm-benchmarks/benchmarks | |||
|
|
|||
| echo "Running current benchmarks with PR library..." | |||
| cp ../build/lib/libdtvmapi.so ./libdtvmapi.so | |||
| python3 check_performance_regression.py \ | |||
| python3 check_performance_regression.py $PERF_ARGS \ | |||
| --baseline "$SAVE_PATH" \ | |||
| --threshold "$BENCHMARK_THRESHOLD" \ | |||
| --output-summary "$BENCHMARK_SUMMARY_FILE" \ | |||
| @@ -225,15 +235,15 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do | |||
| --benchmark-dir test/evm-benchmarks/benchmarks | |||
| elif [ -n "$BENCHMARK_SAVE_BASELINE" ]; then | |||
| echo "Saving performance baseline..." | |||
| python3 check_performance_regression.py \ | |||
| python3 check_performance_regression.py $PERF_ARGS \ | |||
| --save-baseline "$BENCHMARK_SAVE_BASELINE" \ | |||
| --output-summary "$BENCHMARK_SUMMARY_FILE" \ | |||
| --lib ./libdtvmapi.so \ | |||
| --mode "$BENCHMARK_MODE" \ | |||
| --benchmark-dir test/evm-benchmarks/benchmarks | |||
| elif [ -n "$BENCHMARK_BASELINE_FILE" ]; then | |||
| echo "Checking performance regression against baseline..." | |||
| python3 check_performance_regression.py \ | |||
| python3 check_performance_regression.py $PERF_ARGS \ | |||
| --baseline "$BENCHMARK_BASELINE_FILE" \ | |||
| --threshold "$BENCHMARK_THRESHOLD" \ | |||
| --output-summary "$BENCHMARK_SUMMARY_FILE" \ | |||
| @@ -242,7 +252,7 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do | |||
| --benchmark-dir test/evm-benchmarks/benchmarks | |||
| else | |||
| echo "Running benchmark suite without comparison..." | |||
| python3 check_performance_regression.py \ | |||
| python3 check_performance_regression.py $PERF_ARGS \ | |||
There was a problem hiding this comment.
Environment variables like BENCHMARK_REPETITIONS and BENCHMARK_MIN_TIME are concatenated into PERF_ARGS and then expanded unquoted in the python3 check_performance_regression.py $PERF_ARGS invocations in this block. If an attacker can influence these environment variables (e.g., via CI inputs or environment configuration), they could inject shell metacharacters such as ; or backticks and cause arbitrary commands to execute in the CI environment. To mitigate this, strictly validate these variables (for example, ensuring they are numeric-only) and pass them as separate, properly quoted arguments or via a shell array instead of constructing a raw argument string that is expanded unquoted.
⚡ Performance Regression Check Results
|
Summary
Disable ASLR during performance benchmark runs in CI to eliminate the
***WARNING*** ASLR is enabled, the results may have unreproducible noise in them.message and reduce variance in regression results.Changes
bash .ci/run_test_suite.shwithsetarch x86_64 -Rin the performance regression check jobsetarch x86_64 -Rruns the benchmark process withADDR_NO_RANDOMIZEpersonality, disabling ASLR for that process only (no system-wide change, no root required)Motivation
Google Benchmark reports ASLR as a source of unreproducible noise. Disabling it for the benchmark process should make results more stable on shared GitHub Actions runners.
Made with Cursor