Skip to content

Add Darwin/macOS support for SSS and Tang pins#546

Open
sini wants to merge 6 commits intolatchset:masterfrom
sini:darwin-support
Open

Add Darwin/macOS support for SSS and Tang pins#546
sini wants to merge 6 commits intolatchset:masterfrom
sini:darwin-support

Conversation

@sini
Copy link

@sini sini commented Mar 14, 2026

Summary

This adds macOS/Darwin compatibility to the SSS and Tang pins by replacing
Linux-specific APIs with POSIX equivalents and making the LUKS test suite
gracefully skip when cryptsetup is unavailable.

Tested on macOS 26 (aarch64-darwin) — Tang and SSS-based encryption and
decryption both work correctly.

This may also help with support for #541 and #504

Changes

sss: remove unused sys/epoll.h include from clevis-encrypt-sss.c
This header was included but never used. Removing it fixes a compile error on
Darwin where sys/epoll.h does not exist.

sss: replace Linux epoll with POSIX poll in clevis-decrypt-sss.c
epoll is Linux-specific. The SSS decrypt pin monitors a small number of child
process file descriptors, so poll() is functionally equivalent and is available
on all POSIX platforms.

sss: replace pipe2 with portable pipe + fcntl in sss.c
pipe2(fd, O_CLOEXEC) is Linux-specific. The replacement uses pipe() followed
by fcntl(F_SETFD, FD_CLOEXEC) on both descriptors. This is safe because the
call() function operates in a single-threaded context (the fork follows
immediately), so there is no window for a descriptor leak.

luks: make cryptsetup optional in test suite
The LUKS test meson.build is included unconditionally by the parent build, but
all tests require cryptsetup, which is unavailable on macOS. Changed
find_program('cryptsetup', required: true) to required: false with an early
subdir_done() so the build completes without cryptsetup. No test logic is
altered — when cryptsetup is present, all tests run as before.

Motivation

Clevis is used in NixOS disk encryption workflows (Tang + SSS) to generate and
decrypt JWE-wrapped keys. Being able to run clevis encrypt tang and
clevis encrypt sss on macOS enables Darwin-based workstations to provision
NixOS hosts with encrypted disks without requiring a Linux VM.

Test plan

  • Builds on x86_64-linux (NixOS)
  • Builds on aarch64-darwin (macOS 26)
  • clevis encrypt tang / clevis decrypt round-trip on macOS
  • clevis encrypt sss / clevis decrypt round-trip on macOS
  • Existing Linux build and test suite unaffected (no behavior change when
    cryptsetup is present)

sini added 4 commits March 14, 2026 15:47
clevis-encrypt-sss.c includes sys/epoll.h but never uses any epoll
functions. This unused include prevents compilation on platforms that
lack epoll (e.g., macOS/Darwin).
clevis-decrypt-sss.c used epoll to monitor child process output file
descriptors. epoll is Linux-specific and prevents compilation on other
POSIX platforms such as macOS/Darwin.

Replace epoll with poll(), which is POSIX standard and functionally
equivalent for the small number of file descriptors monitored here.
The pollfds array is dynamically allocated to match the number of
child processes.
pipe2() is a Linux-specific extension (requires _GNU_SOURCE) that
atomically creates a pipe with flags. Replace it with the POSIX
equivalent: pipe() followed by fcntl(F_SETFD, FD_CLOEXEC).

The atomicity difference is irrelevant here since the program is
single-threaded — there is no risk of a concurrent fork leaking
file descriptors between the pipe() and fcntl() calls.

This enables compilation on platforms that lack pipe2(), such as
macOS/Darwin.
The LUKS test directory is included unconditionally by the parent
meson.build (only gated by cross-compilation), but all LUKS tests
require the cryptsetup binary. When cryptsetup is unavailable (e.g.,
on macOS/Darwin or minimal build environments), the build fails at
configure time.

Make cryptsetup optional and use subdir_done() to skip the entire
test directory when it is not found. This also avoids the
luksmeta_data.get('OLD_CRYPTSETUP') error that occurs when
libcryptsetup was not detected.

Move the jq find_program() call after the cryptsetup guard since
it is only used within LUKS tests.
@sini
Copy link
Author

sini commented Mar 16, 2026

I'll take a look and see if I can reproduce the timeout test failure locally.

EDIT: I reproduced it and think I have the root cause -- working on the fix, moving this to draft until I've resolved it.

@sini sini marked this pull request as draft March 16, 2026 17:04
When a pin's file descriptor is closed after reading, it must be removed
from the poll set. Unlike epoll (which automatically removes closed fds),
poll() will continue to return events for closed fds.

Additionally, poll() can return POLLHUP/POLLERR/POLLNVAL when a child
process exits or encounters errors. These events must be handled to
avoid infinite loops, but only when there's no data to read (POLLIN not
set) - otherwise we might discard valid data from a process that wrote
output then exited.

This fix:
- Sets closed fds to -1 in the pollfds array (poll ignores negative fds)
- Handles error/hangup events by cleaning up failed pins, but only when
  POLLIN is not set, ensuring we read all available data first
@sini
Copy link
Author

sini commented Mar 16, 2026

The epoll to poll conversion had a subtle bug with fd lifecycle management. When epoll monitors a closed fd, it automatically removes it from the set. Poll doesn't - it keeps returning POLLHUP indefinitely, causing an infinite loop.

Why it wasn't caught initially: I only tested on Darwin (which was the whole point of the port), where the Tang/SSS functionality worked fine for my use case. That was an oversight on my part, I've now verified the whole suite works on all tested platforms. The bug only surfaced when running the full Linux test suite, which exercised edge cases like child process failures and invalid configurations.

The fix:

  • Set closed fds to -1 so poll ignores them (matching epoll's auto-removal)
  • Handle POLLHUP/POLLERR/POLLNVAL to clean up failed children - but only when there's no POLLIN, since a process can write valid data and immediately exit (setting both POLLHUP and POLLIN). Reading the data first ensures we don't discard valid output.

@sini sini marked this pull request as ready for review March 16, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant