Skip to content

bpf: Add support for sleepable tracepoint programs#11398

Closed
kernel-patches-daemon-bpf[bot] wants to merge 5 commits intobpf-next_basefrom
series/1066584=>bpf-next
Closed

bpf: Add support for sleepable tracepoint programs#11398
kernel-patches-daemon-bpf[bot] wants to merge 5 commits intobpf-next_basefrom
series/1066584=>bpf-next

Conversation

@kernel-patches-daemon-bpf
Copy link
Copy Markdown

Pull request for series with
subject: bpf: Add support for sleepable tracepoint programs
version: 4
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: ca0f39a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

@kernel-patches-review-bot
Copy link
Copy Markdown

commit 2b9c4f9a741437688351ae80490daa5b8ce7556e
Author: Mykyta Yatsenko <yatsenko@meta.com>

selftests/bpf: Add tests for sleepable tracepoint programs

Adds functional tests for sleepable tracepoint programs (tp_btf.s,
raw_tp.s, tp.s) attaching to nanosleep, plus a negative test for
non-faultable tracepoints, and updated verifier/sleepable.c tests.

> Add a negative test (test_sleepable_raw_tp_fail) that verifies
> sleepable programs are rejected on non-faultable tracepoints
> (sched_switch).

The commit message references test_sleepable_raw_tp_fail, but this
name doesn't appear anywhere in the committed code. It looks like a
leftover from v2 of the series, where the negative test lived in a
separate file (progs/test_sleepable_raw_tp_fail.c).

In this version the negative test is handle_sched_switch in
progs/test_sleepable_tracepoints.c, tested via the RUN_TESTS macro.
Should the commit message be updated to reflect the current naming?


AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Add tests for sleepable tracepoint programs
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23070834076

@kernel-patches-review-bot
Copy link
Copy Markdown

commit b7b6029199ace23f1ddaf7529035791cd2799bf4
Author: Mykyta Yatsenko <yatsenko@meta.com>

libbpf: Add section handlers for sleepable tracepoints

Add SEC_DEF entries for sleepable tracepoint variants and update
attach_tp() and attach_raw_tp() to parse the new section name
prefixes.

> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0662d72bad20..ef79a980c0bb 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c

[ ... ]

> @@ -12997,11 +13009,19 @@ static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_lin
>  	if (!sec_name)
>  		return -ENOMEM;
>
> -	/* extract "tp/<category>/<name>" or "tracepoint/<category>/<name>" */
> -	if (str_has_pfx(prog->sec_name, "tp/"))
> -		tp_cat = sec_name + sizeof("tp/") - 1;
> -	else
> -		tp_cat = sec_name + sizeof("tracepoint/") - 1;
> +	/* extract "<prefix><category>/<name>" */
> +	tp_cat = NULL;
> +	for (i = 0; i < ARRAY_SIZE(prefixes); i++) {
> +		if (str_has_pfx(prog->sec_name, prefixes[i])) {
> +			tp_cat = sec_name + strlen(prefixes[i]);
> +			break;
> +		}
> +	}
> +	if (!tp_cat) {
> +		free(sec_name);
> +		return -EINVAL;
> +	}

Should attach_tp() handle bare SEC("tp.s") and SEC("tracepoint.s")
the same way it handles bare SEC("tp") and SEC("tracepoint")?

sec_def_matches() allows bare section names for "+" suffixed
SEC_DEF entries -- when sec_name[len] == '\0' it returns true.
So SEC("tp.s") matches SEC_DEF("tp.s+") and reaches attach_tp().

The existing bare-name guard only checks for "tp" and "tracepoint":

    if (strcmp(prog->sec_name, "tp") == 0 ||
        strcmp(prog->sec_name, "tracepoint") == 0)
        return 0;

Since none of the prefixes ("tp.s/", "tp/", "tracepoint.s/",
"tracepoint/") match the bare string "tp.s", tp_cat stays NULL
and the function returns -EINVAL. In bpf_object__attach_skeleton()
this becomes a fatal error that stops the entire skeleton attach.

attach_raw_tp() in the same patch handles the equivalent bare case
correctly -- when sec_name[pfx_len] == '\0' it returns 0:

    for (i = 0; i < ARRAY_SIZE(prefixes); i++) {
        ...
        if (prog->sec_name[pfx_len] == '\0')
            return 0;
        ...
    }

The convention across libbpf attach handlers (attach_kprobe,
attach_raw_tp) is to return 0 with *link = NULL for bare section
names, signaling "no auto-attach" without error.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: libbpf: Add section handlers for sleepable tracepoints
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23070834076

@kernel-patches-review-bot
Copy link
Copy Markdown

commit ac1bb22ffd7320682f2a63c12ab84193029f707b
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Add sleepable support for classic tracepoint programs

Adds trace_call_bpf_faultable() for faultable tracepoints supporting
sleepable BPF programs, restructures perf_syscall_enter/exit to run
BPF before preempt_disable, and gates sleepable tracepoint programs
to syscall tracepoints at attach time.

> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 37317b81fcda..e9ddf9db2b56 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -1372,26 +1372,27 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
>  static int sys_perf_refcount_enter;
>  static int sys_perf_refcount_exit;
>
> -static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
> +static int perf_call_bpf_enter(struct trace_event_call *call,
>  			       struct syscall_metadata *sys_data,
> -			       struct syscall_trace_enter *rec)
> +			       int syscall_nr, unsigned long *args)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long args[SYSCALL_DEFINE_MAXARGS];
>  	} __aligned(8) param;
> +	struct pt_regs regs;
>  	int i;
>
>  	BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
>
> -	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
> -	perf_fetch_caller_regs(regs);
> +	/* bpf prog requires 'regs' to be the first member in the ctx */
> +	perf_fetch_caller_regs(&regs);

Should regs be zero-initialized here? The perf_fetch_caller_regs()
comment in include/linux/perf_event.h states:

    NOTE: assumes @regs is otherwise already 0 filled; this is
    important for things like PERF_SAMPLE_REGS_INTR.

The function only sets ip, sp, cs, and flags, leaving all other
register fields at whatever happened to be on the stack.

Previously, the regs pointer came from perf_trace_buf_alloc() which
returned this_cpu_ptr(&__perf_regs[rctx]), a per-cpu variable that
is zero-initialized at boot. The new stack-local struct does not
get that zeroing.

If a BPF program calls bpf_perf_event_output(), bpf_perf_event_output_tp()
extracts this pointer:

    struct pt_regs *regs = *(struct pt_regs **)tp_buff;

and passes it through to perf_event_output(), where uninitialized
register fields could end up in perf samples.

Would initializing to zero fix this?

    struct pt_regs regs = {};

> +	*(struct pt_regs **)&param = &regs;

[ ... ]

> @@ -1515,21 +1519,22 @@ static void perf_sysenter_disable(struct trace_event_call *call)
>  		syscall_fault_buffer_disable();
>  }
>
> -static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
> -			      struct syscall_trace_exit *rec)
> +static int perf_call_bpf_exit(struct trace_event_call *call,
> +			      int syscall_nr, long ret_val)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long ret;
>  	} __aligned(8) param;
> -
> -	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
> -	perf_fetch_caller_regs(regs);
> +	struct pt_regs regs;
> +
> +	/* bpf prog requires 'regs' to be the first member in the ctx */
> +	perf_fetch_caller_regs(&regs);

Same uninitialized regs concern as perf_call_bpf_enter() above.

> +	*(struct pt_regs **)&param = &regs;

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Add sleepable support for classic tracepoint programs
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23070834076

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: bb41fce
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: bb41fce
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 202e42e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 6c8e1a9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

mykyta5 added 2 commits March 16, 2026 14:21
Rework __bpf_trace_run() to support sleepable BPF programs by using
explicit RCU flavor selection, following the uprobe_prog_run() pattern.

For sleepable programs, use rcu_read_lock_trace() for lifetime
protection and add a might_fault() annotation. For non-sleepable
programs, use the regular rcu_read_lock(). Replace the combined
rcu_read_lock_dont_migrate() with separate rcu_read_lock()/
migrate_disable() calls, since sleepable programs need
rcu_read_lock_trace() instead of rcu_read_lock().

Remove the preempt_disable_notrace/preempt_enable_notrace pair from
the faultable tracepoint BPF probe wrapper in bpf_probe.h, since
preemption management is now handled inside __bpf_trace_run().

This enables both BTF-based raw tracepoints (tp_btf.s) and classic
raw tracepoints (raw_tp.s) to run sleepable BPF programs when
attached to faultable tracepoints (e.g. syscall tracepoints).

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
faultable tracepoints that supports sleepable BPF programs. It uses
rcu_read_lock_trace() for lifetime protection and
bpf_prog_run_array_uprobe() for per-program RCU flavor selection,
following the uprobe_prog_run() pattern. Uses preempt-safe
this_cpu_inc_return/this_cpu_dec for the bpf_prog_active recursion
counter since preemption is enabled in this context.

Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
filter before perf event processing. Previously, BPF ran after the
per-cpu perf trace buffer was allocated under preempt_disable,
requiring cleanup via perf_swevent_put_recursion_context() on filter.
Now BPF runs in faultable context before preempt_disable, reading
syscall arguments from local variables instead of the per-cpu trace
record, removing the dependency on buffer allocation. This allows
sleepable BPF programs to execute and avoids unnecessary buffer
allocation when BPF filters the event. The perf event submission
path (buffer allocation, fill, submit) remains under preempt_disable
as before.

Add an attach-time check in __perf_event_set_bpf_prog() to reject
sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
tracepoints, since only syscall tracepoints run in faultable context.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
mykyta5 added 3 commits March 16, 2026 14:21
Allow BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_TRACEPOINT, and
BPF_TRACE_RAW_TP (tp_btf) programs to be sleepable by adding them
to can_be_sleepable().

For BTF-based raw tracepoints (tp_btf), add a load-time check in
bpf_check_attach_target() that rejects sleepable programs attaching
to non-faultable tracepoints with a descriptive error message.

For classic raw tracepoints (raw_tp), add an attach-time check in
bpf_raw_tp_link_attach() that rejects sleepable programs on
non-faultable tracepoints. The attach-time check is needed because
the tracepoint name is not known at load time for classic raw_tp.

Replace the verbose error message that enumerates allowed program
types with a generic "program of this type cannot be sleepable"
message, since the list of sleepable-capable types keeps growing.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add SEC_DEF entries for sleepable tracepoint variants:
  - "tp_btf.s+"     for sleepable BTF-based raw tracepoints
  - "raw_tp.s+"     for sleepable classic raw tracepoints
  - "raw_tracepoint.s+" (alias)
  - "tp.s+"         for sleepable classic tracepoints
  - "tracepoint.s+" (alias)

Update attach_raw_tp() to recognize "raw_tp.s" and
"raw_tracepoint.s" prefixes when extracting the tracepoint name.

Rewrite attach_tp() to use a prefix array including "tp.s/" and
"tracepoint.s/" variants for proper section name parsing.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add functional tests for sleepable tracepoint programs that attach to
the nanosleep syscall and use bpf_copy_from_user() to read user memory:

  - tp_btf: BTF-based raw tracepoint using SEC("tp_btf.s/sys_enter")
    with PT_REGS_PARM1_SYSCALL (non-CO-RE macro for BTF programs).

  - classic: Classic raw tracepoint using SEC("raw_tp.s/sys_enter")
    with PT_REGS_PARM1_CORE_SYSCALL (CO-RE macro needed for classic).

  - tracepoint: Classic tracepoint using
    SEC("tp.s/syscalls/sys_enter_nanosleep") receiving
    struct syscall_trace_enter with direct access to args[].

Add a negative test (test_sleepable_raw_tp_fail) that verifies
sleepable programs are rejected on non-faultable tracepoints
(sched_switch).

Update verifier/sleepable.c tests:
  - Add "sleepable raw tracepoint accept" test for sys_enter.
  - Rename reject test and update error message to match the new
    descriptive "Sleepable program cannot attach to non-faultable
    tracepoint" message.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 2364959
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1066584
version: 4

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot deleted the series/1066584=>bpf-next branch March 18, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant