Skip to content

fix: retry rctx.execute() on SIGKILL (exit 137) for macOS 26#2750

Open
yetanotheralex wants to merge 1 commit intoaspect-build:mainfrom
yetanotheralex:fix/macos-26-rctx-execute-sigkill-retry
Open

fix: retry rctx.execute() on SIGKILL (exit 137) for macOS 26#2750
yetanotheralex wants to merge 1 commit intoaspect-build:mainfrom
yetanotheralex:fix/macos-26-rctx-execute-sigkill-retry

Conversation

@yetanotheralex
Copy link
Copy Markdown

@yetanotheralex yetanotheralex commented Mar 6, 2026

Problem

macOS 26 (Tahoe) has a known bug in its networking framework's pthread_atfork handler that intermittently kills child processes with SIGKILL (exit code 137) when spawned via fork+exec. This affects Bazel's rctx.execute() calls in repository rules.

In rules_js, the npm_translate_lock repository rule calls rctx.execute() to run yq (for parsing pnpm-lock.yaml), mkdir, and cp (for copying input files). On macOS 26, these commands are intermittently killed, producing errors like:

ERROR: pnpm-lock.yaml parse error failed to parse pnpm lock file with yq.
'...yq .../pnpm-lock.yaml -o=json' exited with 137

or:

Failed to copy file. 'cp -f .npmrc ...' exited with 137

The failure is transient — retrying the same command succeeds.

Fix

Add an execute_with_retry() helper in utils.bzl that catches exit code 137 (SIGKILL) and retries up to 3 times. Apply it to all rctx.execute() call sites in:

  • npm/private/npm_translate_lock_state.bzl_yaml_to_json() (yq), _copy_input_file() (mkdir, cp)
  • npm/private/utils.bzl_reverse_force_copy() (mkdir, cp)

The function is also exported via utils.execute_with_retry so downstream consumers can use it.

References

Testing

Verified on macOS 26.3 (Tahoe) with Bazel 7.7.1 — the patch resolves the intermittent SIGKILL failures during npm_translate_lock repository rule evaluation.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 6, 2026

CLA assistant check
All committers have signed the CLA.

@aspect-workflows
Copy link
Copy Markdown

aspect-workflows Bot commented Mar 6, 2026

Bazel 7 (Test)

Buildkite build #12249 is running...


Bazel 8 (Test)

Buildkite build #12249 is running...


Bazel 9 (Test)

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/bzlmod

All tests were cache hits

7 tests (100.0%) were fully cached saving 594ms.


Bazel 8 (Test)

e2e/bzlmod

All tests were cache hits

7 tests (100.0%) were fully cached saving 639ms.


Bazel 9 (Test)

e2e/bzlmod

All tests were cache hits

7 tests (100.0%) were fully cached saving 598ms.


Bazel 7 (Test)

e2e/git_dep_metadata

All tests were cache hits

1 test (100.0%) was fully cached saving 30ms.


Bazel 8 (Test)

e2e/git_dep_metadata

All tests were cache hits

1 test (100.0%) was fully cached saving 26ms.


Bazel 9 (Test)

e2e/git_dep_metadata

All tests were cache hits

1 test (100.0%) was fully cached saving 30ms.


Bazel 7 (Test)

e2e/gyp_no_install_script

All tests were cache hits

2 tests (100.0%) were fully cached saving 112ms.


Bazel 8 (Test)

e2e/gyp_no_install_script

All tests were cache hits

1 test (100.0%) was fully cached saving 50ms.


Bazel 9 (Test)

e2e/gyp_no_install_script

All tests were cache hits

1 test (100.0%) was fully cached saving 46ms.


Bazel 7 (Test)

e2e/js_binary_workspace

All tests were cache hits

1 test (100.0%) was fully cached saving 44ms.


Bazel 8 (Test)

e2e/js_binary_workspace

All tests were cache hits

1 test (100.0%) was fully cached saving 30ms.


Bazel 9 (Test)

e2e/js_binary_workspace

All tests were cache hits

1 test (100.0%) was fully cached saving 35ms.


Bazel 7 (Test)

e2e/js_image_oci

All tests were cache hits

1 test (100.0%) was fully cached saving 3s.


Bazel 7 (Test)

e2e/npm_link_package

All tests were cache hits

2 tests (100.0%) were fully cached saving 172ms.


Bazel 8 (Test)

e2e/npm_link_package

All tests were cache hits

2 tests (100.0%) were fully cached saving 137ms.


Bazel 9 (Test)

e2e/npm_link_package

All tests were cache hits

2 tests (100.0%) were fully cached saving 170ms.


Bazel 7 (Test)

e2e/npm_link_package-esm

All tests were cache hits

2 tests (100.0%) were fully cached saving 189ms.


Bazel 8 (Test)

e2e/npm_link_package-esm

All tests were cache hits

2 tests (100.0%) were fully cached saving 138ms.


Bazel 9 (Test)

e2e/npm_link_package-esm

All tests were cache hits

2 tests (100.0%) were fully cached saving 173ms.


Bazel 7 (Test)

e2e/npm_link_package-rerooted

All tests were cache hits

2 tests (100.0%) were fully cached saving 207ms.


Bazel 8 (Test)

e2e/npm_link_package-rerooted

All tests were cache hits

2 tests (100.0%) were fully cached saving 139ms.


Bazel 9 (Test)

e2e/npm_link_package-rerooted

All tests were cache hits

2 tests (100.0%) were fully cached saving 186ms.


Bazel 7 (Test)

e2e/npm_translate_lock

All tests were cache hits

3 tests (100.0%) were fully cached saving 383ms.


Bazel 8 (Test)

e2e/npm_translate_lock

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/npm_translate_lock

All tests were cache hits

3 tests (100.0%) were fully cached saving 289ms.


Bazel 7 (Test)

e2e/npm_translate_lock_disable_hooks

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/npm_translate_lock_disable_hooks

All tests were cache hits

1 test (100.0%) was fully cached saving 62ms.


Bazel 9 (Test)

e2e/npm_translate_lock_disable_hooks

All tests were cache hits

1 test (100.0%) was fully cached saving 32ms.


Bazel 7 (Test)

e2e/npm_translate_lock_empty

All tests were cache hits

2 tests (100.0%) were fully cached saving 132ms.


Bazel 8 (Test)

e2e/npm_translate_lock_empty

All tests were cache hits

2 tests (100.0%) were fully cached saving 114ms.


Bazel 9 (Test)

e2e/npm_translate_lock_empty

All tests were cache hits

2 tests (100.0%) were fully cached saving 105ms.


Bazel 7 (Test)

e2e/npm_translate_lock_exclude_package_contents

All tests were cache hits

1 test (100.0%) was fully cached saving 33ms.


Bazel 8 (Test)

e2e/npm_translate_lock_exclude_package_contents

All tests were cache hits

1 test (100.0%) was fully cached saving 21ms.


Bazel 9 (Test)

e2e/npm_translate_lock_exclude_package_contents

All tests were cache hits

1 test (100.0%) was fully cached saving 86ms.


Bazel 7 (Test)

e2e/npm_translate_lock_multi

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/npm_translate_lock_multi

All tests were cache hits

2 tests (100.0%) were fully cached saving 54ms.


Bazel 9 (Test)

e2e/npm_translate_lock_multi

All tests were cache hits

2 tests (100.0%) were fully cached saving 113ms.


Bazel 7 (Test)

e2e/npm_translate_lock_partial_clone

All tests were cache hits

1 test (100.0%) was fully cached saving 26ms.


Bazel 8 (Test)

e2e/npm_translate_lock_partial_clone

All tests were cache hits

1 test (100.0%) was fully cached saving 30ms.


Bazel 9 (Test)

e2e/npm_translate_lock_partial_clone

All tests were cache hits

1 test (100.0%) was fully cached saving 38ms.


Bazel 7 (Test)

e2e/npm_translate_lock_replace_packages

All tests were cache hits

4 tests (100.0%) were fully cached saving 321ms.


Bazel 8 (Test)

e2e/npm_translate_lock_replace_packages

All tests were cache hits

4 tests (100.0%) were fully cached saving 249ms.


Bazel 9 (Test)

e2e/npm_translate_lock_replace_packages

All tests were cache hits

4 tests (100.0%) were fully cached saving 320ms.


Bazel 7 (Test)

e2e/npm_translate_lock_subdir_patch

Waiting for runner...


Bazel 8 (Test)

e2e/npm_translate_lock_subdir_patch

All tests were cache hits

1 test (100.0%) was fully cached saving 67ms.


Bazel 9 (Test)

e2e/npm_translate_lock_subdir_patch

All tests were cache hits

1 test (100.0%) was fully cached saving 50ms.


Bazel 7 (Test)

e2e/npm_translate_package_lock

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/npm_translate_package_lock

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/npm_translate_package_lock

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/npm_translate_yarn_lock

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/npm_translate_yarn_lock

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/npm_translate_yarn_lock

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/output_paths

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/output_paths

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/output_paths

All tests were cache hits

2 tests (100.0%) were fully cached saving 171ms.


Bazel 7 (Test)

e2e/package_json_module

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/package_json_module

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/package_json_module

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/patch_from_repo

All tests were cache hits

1 test (100.0%) was fully cached saving 25ms.


Bazel 7 (Test)

e2e/pnpm_lockfiles

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/pnpm_lockfiles

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/pnpm_lockfiles

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/pnpm_repo_install

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/pnpm_repo_install

All tests were cache hits

1 test (100.0%) was fully cached saving 772ms.


Bazel 9 (Test)

e2e/pnpm_repo_install

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/pnpm_version

All tests were cache hits

1 test (100.0%) was fully cached saving 44ms.


Bazel 8 (Test)

e2e/pnpm_version

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/pnpm_version

Waiting for runner...


Bazel 7 (Test)

e2e/pnpm_workspace

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/pnpm_workspace

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/pnpm_workspace

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/pnpm_workspace_deps

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/pnpm_workspace_deps

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/pnpm_workspace_deps

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/pnpm_workspace_rerooted

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/pnpm_workspace_rerooted

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/pnpm_workspace_rerooted

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/repo_mapping

All tests were cache hits

3 tests (100.0%) were fully cached saving 222ms.


Bazel 8 (Test)

e2e/repo_mapping

Waiting for runner...


Bazel 9 (Test)

e2e/repo_mapping

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/runfiles

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/runfiles

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/runfiles

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/stamped_package_json

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/stamped_package_json

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/stamped_package_json

Waiting for runner...


Bazel 7 (Test)

e2e/vendored_node

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/vendored_node

Waiting for runner...


Bazel 9 (Test)

e2e/vendored_node

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/vendored_tarfile

All tests were cache hits

1 test (100.0%) was fully cached saving 25ms.


Bazel 8 (Test)

e2e/vendored_tarfile

Buildkite build #12249 is running...


Bazel 9 (Test)

e2e/vendored_tarfile

All tests were cache hits

1 test (100.0%) was fully cached saving 32ms.


Bazel 7 (Test)

e2e/verify_patches

All tests were cache hits

2 tests (100.0%) were fully cached saving 92ms.


Bazel 8 (Test)

e2e/verify_patches

All tests were cache hits

2 tests (100.0%) were fully cached saving 120ms.


Bazel 9 (Test)

e2e/verify_patches

Buildkite build #12249 is running...


Bazel 7 (Test)

e2e/worker

Buildkite build #12249 is running...


Bazel 8 (Test)

e2e/worker

All tests were cache hits

1 test (100.0%) was fully cached saving 50ms.


Bazel 9 (Test)

e2e/worker

All tests were cache hits

1 test (100.0%) was fully cached saving 92ms.


Buildifier      Format

@yetanotheralex yetanotheralex marked this pull request as ready for review March 6, 2026 09:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfca8bd270

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread npm/private/utils.bzl Outdated
@yetanotheralex yetanotheralex force-pushed the fix/macos-26-rctx-execute-sigkill-retry branch from bfca8bd to 9f4106e Compare March 6, 2026 09:55
@jbedard
Copy link
Copy Markdown
Member

jbedard commented Mar 6, 2026

This is a bug with golang and macos 26? Can we fix this where that golang is located instead?

@yetanotheralex
Copy link
Copy Markdown
Author

This is a bug with golang and macos 26? Can we fix this where that golang is located instead?

It's not a golang-specific bug -- it's a macOS 26 (Tahoe) bug in the OS networking framework's pthread_atfork handler that intermittently sends SIGKILL to any child process spawned via fork+exec.

During investigation we saw exit code 137 hitting not just yq (Go) but also cp, mkdir, openssl, and rg -- none of which are Go programs. The issue is on the parent side (Bazel's JVM calling ProcessBuilder/fork+exec), and the OS kills the child before it even gets a chance to run.

There's an open report on the skaffold project with the same root cause: GoogleContainerTools/skaffold#9925

Until Apple ships a fix for macOS 26, the retry at the rctx.execute() call site is the pragmatic workaround -- there's nothing Go (or any of the affected binaries) can do differently since the OS is killing the process externally.

@yetanotheralex yetanotheralex force-pushed the fix/macos-26-rctx-execute-sigkill-retry branch from 9f4106e to 5da5b3e Compare March 9, 2026 10:27
macOS 26 (Tahoe) has a bug in its networking framework's
pthread_atfork handler that intermittently kills child processes
with SIGKILL (exit 137) when spawned via fork+exec. This affects
all rctx.execute() calls in repository rules, causing yq, cp, and
mkdir commands to fail randomly during npm_translate_lock.

Add an execute_with_retry() wrapper in utils.bzl that catches exit
code 137 and retries up to 3 times. Apply it to all rctx.execute()
call sites in npm_translate_lock_state.bzl (yq lockfile parsing,
mkdir, cp) and utils.bzl (reverse_force_copy).

The failure is intermittent so retries reliably work around it.

References:
- GoogleContainerTools/skaffold#9925
- bazelbuild/bazel#27026

Made-with: Cursor
@yetanotheralex yetanotheralex force-pushed the fix/macos-26-rctx-execute-sigkill-retry branch from 5da5b3e to f9ede61 Compare March 9, 2026 14:03
@jbedard
Copy link
Copy Markdown
Member

jbedard commented Mar 9, 2026

The bazel issue seems to be resolved in 6.6 and no longer an issue in bazel 7.

What version of bazel are you using?

@yetanotheralex
Copy link
Copy Markdown
Author

The bazel issue seems to be resolved in 6.6 and no longer an issue in bazel 7.

What version of bazel are you using?

7.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants