Skip to content

perf(ci): disable ASLR for benchmark to reduce result variance#385

Closed
starwarfan wants to merge 2 commits intoDTVMStack:mainfrom
starwarfan:perf-ci-disable-aslr
Closed

perf(ci): disable ASLR for benchmark to reduce result variance#385
starwarfan wants to merge 2 commits intoDTVMStack:mainfrom
starwarfan:perf-ci-disable-aslr

Conversation

@starwarfan
Copy link
Contributor

Summary

Disable ASLR during performance benchmark runs in CI to eliminate the ***WARNING*** ASLR is enabled, the results may have unreproducible noise in them. message and reduce variance in regression results.

Changes

  • Wrap bash .ci/run_test_suite.sh with setarch x86_64 -R in the performance regression check job
  • setarch x86_64 -R runs the benchmark process with ADDR_NO_RANDOMIZE personality, disabling ASLR for that process only (no system-wide change, no root required)

Motivation

Google Benchmark reports ASLR as a source of unreproducible noise. Disabling it for the benchmark process should make results more stable on shared GitHub Actions runners.

Made with Cursor

Tune the underlying Google Benchmark parameters in CI to reduce test
variance in shared GitHub Actions environments:
- Expose --benchmark-min-time to check_performance_regression.py.
- Pass `--benchmark_enable_random_interleaving=true` when repetitions > 1.
- Provide `BENCHMARK_REPETITIONS` and `BENCHMARK_MIN_TIME` in CI env.
- Increase benchmark repetitions to 5 and min-time to 2s in workflows.

These changes smooth out short-term CPU spikes and filter out noise
caused by intermittent CI noisy neighbors.

Made-with: Cursor
Use setarch x86_64 -R to run evmone-bench with ASLR disabled,
eliminating the 'ASLR is enabled' warning and reducing unreproducible
noise in performance regression results.

Made-with: Cursor
@zoowii zoowii requested a review from Copilot March 5, 2026 09:43
@starwarfan starwarfan marked this pull request as draft March 5, 2026 09:46
@starwarfan starwarfan closed this Mar 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces benchmark result variance in CI by applying several noise-reduction techniques: disabling ASLR via setarch x86_64 -R, enabling random interleaving of benchmark repetitions, increasing the number of repetitions from 3 to 5, and setting a minimum benchmark execution time of 2 seconds.

Changes:

  • Wraps the benchmark CI step with setarch x86_64 -R to disable ASLR, and sets BENCHMARK_REPETITIONS=5 and BENCHMARK_MIN_TIME=2s as environment variables
  • Adds --benchmark-min-time CLI argument to check_performance_regression.py and enables --benchmark_enable_random_interleaving=true when repetitions > 1
  • Plumbs the new BENCHMARK_REPETITIONS and BENCHMARK_MIN_TIME environment variables through run_test_suite.sh to the Python benchmark script

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/dtvm_evm_test_x86.yml Sets BENCHMARK_REPETITIONS=5, BENCHMARK_MIN_TIME=2s, and wraps benchmark run with setarch x86_64 -R
.ci/run_test_suite.sh Reads new env vars, constructs PERF_ARGS, and passes them to all check_performance_regression.py invocations
tools/check_performance_regression.py Adds --benchmark-min-time argument, passes it to Google Benchmark, and adds --benchmark_enable_random_interleaving=true

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +370 to +373
export BENCHMARK_REPETITIONS=5
export BENCHMARK_MIN_TIME=2s

bash .ci/run_test_suite.sh
setarch x86_64 -R bash .ci/run_test_suite.sh
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description only mentions disabling ASLR via setarch x86_64 -R, but the actual changes are broader:

  1. Adds --benchmark_enable_random_interleaving=true to the benchmark command
  2. Adds a new --benchmark-min-time CLI argument and plumbing
  3. Increases BENCHMARK_REPETITIONS from 3 to 5
  4. Sets BENCHMARK_MIN_TIME=2s

These are all reasonable variance-reduction measures, but the description should be updated to reflect the full scope of changes so reviewers and future readers understand the intent.

Copilot uses AI. Check for mistakes.
Comment on lines 208 to +255
@@ -208,15 +218,15 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do
echo "Running baseline benchmarks with library from base branch..."
cp "$BENCHMARK_BASELINE_LIB"/libdtvmapi.so ./libdtvmapi.so
SAVE_PATH=${BASELINE_CACHE:-/tmp/perf_baseline.json}
python3 check_performance_regression.py \
python3 check_performance_regression.py $PERF_ARGS \
--save-baseline "$SAVE_PATH" \
--lib ./libdtvmapi.so \
--mode "$BENCHMARK_MODE" \
--benchmark-dir test/evm-benchmarks/benchmarks

echo "Running current benchmarks with PR library..."
cp ../build/lib/libdtvmapi.so ./libdtvmapi.so
python3 check_performance_regression.py \
python3 check_performance_regression.py $PERF_ARGS \
--baseline "$SAVE_PATH" \
--threshold "$BENCHMARK_THRESHOLD" \
--output-summary "$BENCHMARK_SUMMARY_FILE" \
@@ -225,15 +235,15 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do
--benchmark-dir test/evm-benchmarks/benchmarks
elif [ -n "$BENCHMARK_SAVE_BASELINE" ]; then
echo "Saving performance baseline..."
python3 check_performance_regression.py \
python3 check_performance_regression.py $PERF_ARGS \
--save-baseline "$BENCHMARK_SAVE_BASELINE" \
--output-summary "$BENCHMARK_SUMMARY_FILE" \
--lib ./libdtvmapi.so \
--mode "$BENCHMARK_MODE" \
--benchmark-dir test/evm-benchmarks/benchmarks
elif [ -n "$BENCHMARK_BASELINE_FILE" ]; then
echo "Checking performance regression against baseline..."
python3 check_performance_regression.py \
python3 check_performance_regression.py $PERF_ARGS \
--baseline "$BENCHMARK_BASELINE_FILE" \
--threshold "$BENCHMARK_THRESHOLD" \
--output-summary "$BENCHMARK_SUMMARY_FILE" \
@@ -242,7 +252,7 @@ for STACK_TYPE in ${STACK_TYPES[@]}; do
--benchmark-dir test/evm-benchmarks/benchmarks
else
echo "Running benchmark suite without comparison..."
python3 check_performance_regression.py \
python3 check_performance_regression.py $PERF_ARGS \
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variables like BENCHMARK_REPETITIONS and BENCHMARK_MIN_TIME are concatenated into PERF_ARGS and then expanded unquoted in the python3 check_performance_regression.py $PERF_ARGS invocations in this block. If an attacker can influence these environment variables (e.g., via CI inputs or environment configuration), they could inject shell metacharacters such as ; or backticks and cause arbitrary commands to execute in the CI environment. To mitigate this, strictly validate these variables (for example, ensuring they are numeric-only) and pass them as separate, properly quoted arguments or via a shell array instead of constructing a raw argument string that is expanded unquoted.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

⚡ Performance Regression Check Results

⚠️ Performance Regression Detected (interpreter)

No benchmark summary available.


⚠️ Performance Regression Detected (multipass)

No benchmark summary available.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants