Overview
Golang's os/exec uses the classic UNIX fork+exec pattern to start subprocesses. Since I upgraded to macOS 26, this very often this seems to get stuck, resulting in Skaffold's forked subprocesses spinning in a tight busy/wait loop, and never managing to even execute the subprocess (kubectl, kustomize, etc.).
This is because an atfork handler registered with pthread_atfork by a macOS system framework appears to be buggy.
Moreover, these stuck processes can only be killed with -9.
Attaching debuggers reveals the following:
LLDB for system stack frames:
(lldb) bt
* thread #1, stop reason = EXC_BAD_ACCESS (code=1, address=0x135bfc9d7)
* frame #0: 0x0000000187db5194 libsystem_trace.dylib`_os_log_preferences_refresh + 56
frame #1: 0x0000000187db5afc libsystem_trace.dylib`os_log_type_enabled + 768
frame #2: 0x000000019fd2d55c libnetworkextension.dylib`NEFlowDirectorDestroy + 64
frame #3: 0x00000001911e309c Network`nw_path_release_globals + 164
frame #4: 0x000000019148e060 Network`nw_settings_child_has_forked() + 332
frame #5: 0x000000018808d950 libsystem_pthread.dylib`_pthread_atfork_child_handlers + 76
frame #6: 0x0000000187f38d50 libsystem_c.dylib`fork + 112
frame #7: 0x000000010046aeec skaffold`runtime.syscall.abi0 + 44
frame #8: 0x00000001004696bc skaffold`runtime.asmcgocall.abi0 + 124
frame #9: 0x00000001004696bc skaffold`runtime.asmcgocall.abi0 + 124
frame #10: 0x00000001004696bc skaffold`runtime.asmcgocall.abi0 + 124
The notable part here is macOS Networking framework attempting to some cleanup as a side effect in atfork. This framework probably was initialized as a side-effect by some x509 or networking logic within Skaffold.
And the traceback from golang/skaffold side, using delve:
(dlv) bt
0 0x0000000187db5194 in ???
at ?:-1
1 0x000000010483b368 in runtime.systemstack_switch
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/runtime/asm_arm64.s:249
2 0x0000000104820284 in runtime.libcCall
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/runtime/sys_libc.go:52
3 0x000000010483957c in syscall.rawSyscall
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/runtime/sys_darwin.go:116
4 0x000000010485d3c8 in syscall.forkAndExecInChild
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/syscall/exec_libc2.go:86
5 0x000000010485e9c8 in syscall.forkExec
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/syscall/exec_unix.go:208
6 0x000000010485ed48 in syscall.StartProcess
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/syscall/exec_unix.go:258
7 0x00000001048c08dc in os.startProcess
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/os/exec_posix.go:55
8 0x00000001048bfd84 in os.StartProcess
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/os/exec.go:266
9 0x0000000104fdf9d0 in os/exec.(*Cmd).Start
at /Users/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.4.darwin-arm64/src/os/exec/exec.go:725
10 0x0000000106a14d70 in github.com/GoogleContainerTools/skaffold/v2/pkg/skaffold/util.(*Commander).RunCmdOut
at ./pkg/skaffold/util/cmd.go:102
11 0x0000000106a14758 in github.com/GoogleContainerTools/skaffold/v2/pkg/skaffold/util.RunCmdOut
at ./pkg/skaffold/util/cmd.go:71
Here you just see that sure enough it is trying to os/exec something, and getting stuck.
Repro case
Unfortunately, I wasn't able to isolate the proximate variable that frobs this bug, and so I don't have a standalone repro case to share. My hope is that others running into this in their setups will find this bug report.
The bug seems to include a race condition aspect or some other hidden variable. This mainly occurs in the context of my company's deployment helper script (and yes, I carefully checked it for any process/pipe job control issues). If I run skaffold standalone it seems to happen less frequently.
Workaround/Fix
I ended up vibe coding up a Golang library that mirrors os/exec's API, but uses posix_spawn instead of fork+exec: https://github.com/orospakr/spawnexec . It just calls posix_spawn from libc using cgo.
And I have a fork of Skaffold that replaces all uses of os/exec with spawnexec: https://github.com/orospakr/skaffold-spawnexec. Here's the commit: orospakr@7982734
Clone that fork and build it to use it.
This works perfectly, and now skaffold is rock solid for me.
Background
Even, the bug itself would seem to be in the atfork handler in macOS' networking framework, the current fork+exec strategy for launching subprocesses makes this kind of issue likely, and fix from Apple likely isn't forthcoming.
However, while fork+exec pattern has been considered for many decades to be the pattern for spawning processes on UNIXlike systems, it appears like there's increasing consensus out there that fork/exec isn't a good pattern for spawning subprocesses because of issues just like this. Other than side-effect heavy (and potentially bug prone) pthread atfork handlers, fork+exec does not have a good story for releasing complex OS resources (like what this macOS networking framework is trying to do) at fork time.
Using the alternative "spawn process" system calls (such as posix_spawn, found on both macOS and Linux) offered by modern operating systems avoids this kind of problem.
Further reading, a paper given by a team at Microsoft Research that discusses the underlying issues with fork+exec in great detail : https://www.microsoft.com/en-us/research/wp-content/uploads/2019/04/fork-hotos19.pdf
Golang's standard library (os/exec) however is using continuing to use fork+exec.
Basically it seems to me that upstream Golang will need to address this issue, and consider adopting the posix_spawn syscall in lieu of fork+exec in the os/exec module. This is a hard sell though, so in the meantime I've filed this bug here in Skaffold with a workaround.
Overview
Golang's os/exec uses the classic UNIX fork+exec pattern to start subprocesses. Since I upgraded to macOS 26, this very often this seems to get stuck, resulting in Skaffold's forked subprocesses spinning in a tight busy/wait loop, and never managing to even execute the subprocess (kubectl, kustomize, etc.).
This is because an atfork handler registered with pthread_atfork by a macOS system framework appears to be buggy.
Moreover, these stuck processes can only be killed with -9.
Attaching debuggers reveals the following:
LLDB for system stack frames:
The notable part here is macOS Networking framework attempting to some cleanup as a side effect in atfork. This framework probably was initialized as a side-effect by some x509 or networking logic within Skaffold.
And the traceback from golang/skaffold side, using delve:
Here you just see that sure enough it is trying to os/exec something, and getting stuck.
Repro case
Unfortunately, I wasn't able to isolate the proximate variable that frobs this bug, and so I don't have a standalone repro case to share. My hope is that others running into this in their setups will find this bug report.
The bug seems to include a race condition aspect or some other hidden variable. This mainly occurs in the context of my company's deployment helper script (and yes, I carefully checked it for any process/pipe job control issues). If I run skaffold standalone it seems to happen less frequently.
Workaround/Fix
I ended up vibe coding up a Golang library that mirrors os/exec's API, but uses posix_spawn instead of fork+exec: https://github.com/orospakr/spawnexec . It just calls posix_spawn from libc using cgo.
And I have a fork of Skaffold that replaces all uses of os/exec with spawnexec: https://github.com/orospakr/skaffold-spawnexec. Here's the commit: orospakr@7982734
Clone that fork and build it to use it.
This works perfectly, and now skaffold is rock solid for me.
Background
Even, the bug itself would seem to be in the atfork handler in macOS' networking framework, the current fork+exec strategy for launching subprocesses makes this kind of issue likely, and fix from Apple likely isn't forthcoming.
However, while fork+exec pattern has been considered for many decades to be the pattern for spawning processes on UNIXlike systems, it appears like there's increasing consensus out there that fork/exec isn't a good pattern for spawning subprocesses because of issues just like this. Other than side-effect heavy (and potentially bug prone) pthread atfork handlers, fork+exec does not have a good story for releasing complex OS resources (like what this macOS networking framework is trying to do) at fork time.
Using the alternative "spawn process" system calls (such as posix_spawn, found on both macOS and Linux) offered by modern operating systems avoids this kind of problem.
Further reading, a paper given by a team at Microsoft Research that discusses the underlying issues with fork+exec in great detail : https://www.microsoft.com/en-us/research/wp-content/uploads/2019/04/fork-hotos19.pdf
Golang's standard library (os/exec) however is using continuing to use fork+exec.
Basically it seems to me that upstream Golang will need to address this issue, and consider adopting the posix_spawn syscall in lieu of fork+exec in the os/exec module. This is a hard sell though, so in the meantime I've filed this bug here in Skaffold with a workaround.