Skip to content

kernelCTF: add CVE-2024-26921_lts_cos#310

Open
lambdasprocket wants to merge 1 commit intogoogle:masterfrom
lambdasprocket:CVE-2024-26921_lts_cos
Open

kernelCTF: add CVE-2024-26921_lts_cos#310
lambdasprocket wants to merge 1 commit intogoogle:masterfrom
lambdasprocket:CVE-2024-26921_lts_cos

Conversation

@lambdasprocket
Copy link
Contributor

No description provided.

@koczkatamas koczkatamas added the kCTF: vuln OK The submission exploits the claims vulnerability (passed manual verification) label Jan 19, 2026
"push %r12\n"
"push %rbx\n"
"push %rbp\n"
"lea -0x1838f1(%rip), %r15\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number offset used in inline assembly. Add a comment explaining what the offset -0x1838f1 computes.

Check the 'Name and/or comment numeric constants' section of the style guide.



if (!action)
rtnl_qdisc_plug_set_limit(qdisc, 0x100000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number used for qdisc plug limit. Use a named constant or add a comment explaining the limit.

See the 'Name and/or comment numeric constants' section of the style guide.

Comment on lines +612 to +613
add_qdisc_plug(0x10000, 0, 0);
add_qdisc_plug(0x10000, 0, 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successive calls to the same modification function without explanation. Add a comment explaining why two sequential actions are required on the Qdisc.

See the 'Explain duplicated lines' section of the style guide.

iph->saddr = inet_addr("10.77.77.1");
iph->daddr = inet_addr("10.6.0.1");
iph->id = 0;
iph->tos = 0x99;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 0x99 used for IP TOS. Add a comment or use a define for the TOS value.

Check the 'Name and/or comment numeric constants' section of the style guide.

unsigned int xattr_fd_idx = 0;
char fname[512];

g_payload_location = g_page_offset_base + 0x50000020;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number offset 0x50000020 used for payload target calculation. Document why this offset is chosen (e.g., physmap spray reliability).

See the 'Name and/or comment numeric constants' section of the style guide.

if (pid) {
set_cpu(0);
int sock = send_packet();
sleep(10000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented sleep() call for 10000 seconds. Explain the purpose of this long wait, or use a synchronization primitive instead.

See the 'Sleeping & waiting' section of the style guide.

sleep(10000);
}

sleep(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented sleep() call. Add a comment explaining what state change is expected during this 1 second.

See the 'Sleeping & waiting' section of the style guide.


asm volatile(
"movq 0x820(%%r13), %%r14\n"
"movq $0x10, (%%r14)\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number used for assigning a struct pointer/offset. Add a comment explaining what field is being overwritten and what $0x10 means.

See the 'Name and/or comment numeric constants' section of the style guide.

);

asm volatile(
"movq 0x780(%%r13), %%r14\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded struct offset 0x780 used to access nsproxy. Document the magic offset with a comment explaining it accesses nsproxy within task_struct.

Check the 'Name and/or comment numeric constants' section of the style guide.

);

asm volatile(
"movq 0x820(%%r13), %%r14\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number offset (0x820) used in assembly for accessing a structure field. Add a comment explaining what field the offset refers to.

See the 'Name and/or comment numeric constants' section of the style guide.


Second thing to consider is the kmalloc cache used to allocate struct sock. Most socket families have a dedicated cache, but some use a regular kmalloc(), giving us a simple way to reallocate the freed object without performing a cross-cache attack.

And finally, some sockets use a SOCK_RCU_FREE flag which causes sk_destruct() to wait for an RCU grace period before freeing the sock object and this would also make exploitation much harder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And finally, some sockets use a SOCK_RCU_FREE flag which causes sk_destruct() to wait for an RCU grace period before freeing the sock object and this would also make exploitation much harder.
And finally, some sockets use a SOCK_RCU_FREE flag which causes sk_destruct() to wait for an RCU grace period before freeing the sock object. Because the vulnerable "use" happens synchronously later in the exact same netfilter pipeline, an RCU delay means the memory wouldn't actually be returned to the allocator until after the kernel dereferences the dangling pointer. This would make exploitation much harder, as we would need to artificially stall the packet's execution to wait for the grace period.

@@ -0,0 +1,176 @@
## Overview

Let's look at what we need to perform the attack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a bit of summary / "battle plan" type of paragraph. The reader would need the overview of what we're trying to do.

How about smth like this:

Suggested change
Let's look at what we need to perform the attack.
To exploit this Use-After-Free, we must trigger the bug and overwrite the freed memory entirely within the synchronous execution path of ip_local_out(). The high-level strategy is:
1. Send a locally generated, fragmented IP packet so ip_send_skb() pushes its sk_buff (skb) into the netfilter hooks via ip_local_out().
2. Have ip_defrag() process the skb, dropping the final reference to its associated socket (skb->sk) and freeing the socket's memory.
3. Use a subsequent netfilter hook to immediately allocate over the freed skb->sk memory with our controlled payload.
4. Let a later netfilter hook dereference the forged skb->sk object to gain execution control.
Let's look at what we need to perform the attack.


### Device driver to call ip_local_out()

Because we send our packets at layer 2, ip_send_skb() won't be called and we need to find another way to trigger ip_local_out().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Because we send our packets at layer 2, ip_send_skb() won't be called and we need to find another way to trigger ip_local_out().
Because `AF_PACKET` injects frames directly at Layer 2 (skipping the Layer 3 IP output stack entirely), functions like `ip_send_skb()` are bypassed. If sent through a standard interface, our packet would never hit the netfilter hooks. We need an alternative way to force the kernel to pass our crafted `skb` into `ip_local_out()`.
The IPvlan driver provides the perfect mechanism. When processing outbound IPv4 traffic, the driver manually sets up the routing and directly invokes `ip_local_out()`:


### A way to close the socket fd before the ip_defrag() call

When our packet reaches ip_defrag(), the socket won't be freed if it is still referenced by the open file descriptor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When our packet reaches ip_defrag(), the socket won't be freed if it is still referenced by the open file descriptor.
When our `skb` reaches `ip_defrag()`, the function drops a reference to the socket (`skb->sk`). However, the socket memory is only freed if this drops the reference count (`sk_refcnt`) to zero. If the user-space file descriptor (fd) is still open, it holds an active reference, preventing the allocation from being freed.

When our packet reaches ip_defrag(), the socket won't be freed if it is still referenced by the open file descriptor.
We can call close() only after sendmsg() returns. The syscall returns after the packets is enqueued to the output device, so we might be able to try a race condition to close the fd in time, but there is a simpler way.

sch_plug queuing discipline can be used to stop the packets from being dequeued from a network device until a command to "unplug" is received through the netlink API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sch_plug queuing discipline can be used to stop the packets from being dequeued from a network device until a command to "unplug" is received through the netlink API.
To achieve deterministic execution, we use the `sch_plug` queuing discipline. `sch_plug` can be attached to a network device to pause its egress queue, holding outbound packets until an explicit "unplug" command is received via the Netlink API. This allows us to cleanly suspend the packet's journey right before it enters the vulnerable `ip_local_out()` path.

So if we are able to craft a valid struct xfrm_policy that matches our connection, we will be able to get RIP control.

This policy is prepared in the prepare_policy().
The fake object for the sock itself is simple - we just need to set the sk_policy pointer and sk_mark value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does sk->sk_mark play into this? Please elaborate


So if we are able to craft a valid struct xfrm_policy that matches our connection, we will be able to get RIP control.

This policy is prepared in the prepare_policy().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain in details how and which policy you prepare here in the writeup


## Privilege escalation

Our ROP is executed from the ksoftirqd context, so we can't do a traditional commit_creds() to modify the current process's privileges.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the ROP execution happens in ksoftirqd, it would be great to explicitly tie this back to the add_qdisc_plug unplug command in the text. A simple sentence explaining that unplugging the queue defers the packet processing to the softirq context perfectly bridges the gap.


We chose a rarely used kexec_file_load() syscall and overwrote its code with our get_root function that does all traditional privileges escalation/namespace escape stuff: commit_creds(init_cred), switch_task_namespaces(pid, init_nsproxy) etc.

This function also returns a special value (0x777) that our user space code can use to detect if the system was already compromised.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text mentions that get_root returns 0x777 for user-space to check. However, looking at the inline assembly for get_root and the syscall invocation in main(), this logic seems to have been removed from the code. You should probably delete this sentence from the writeup so readers aren't looking for code that isn't there.


This function also returns a special value (0x777) that our user space code can use to detect if the system was already compromised.

Patching the kernel function is done rop_patch_kernel_code() - it calls set_memory_rw() on destination memory and uses copy_user_generic() to write new code there.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writeup mentions using copy_user_generic(), but your code actually stages the payload via xattrs into the direct mapping and uses memcpy().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kCTF: vuln OK The submission exploits the claims vulnerability (passed manual verification)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants