Skip to content

Add option to skip zero pages during VM migration#117

Draft
arctic-alpaca wants to merge 3 commits intocyberus-technology:gardenlinuxfrom
arctic-alpaca:omit-zero-pages-2
Draft

Add option to skip zero pages during VM migration#117
arctic-alpaca wants to merge 3 commits intocyberus-technology:gardenlinuxfrom
arctic-alpaca:omit-zero-pages-2

Conversation

@arctic-alpaca
Copy link
Copy Markdown

@arctic-alpaca arctic-alpaca commented Mar 23, 2026

Important

Drafted for now until we have a clearer picture how to incorporate this into the stats tracking.

Improvement of #112.

A VM may have previously unused memory that is still zeroed (or memory zeroed by the guest, but that's more unlikely). This memory doesn't need to be transferred during a migration as the migration destination provides zeroed memory to the VM anyway. This PR adds the option to skip zero pages during migration.

The zero page skipping now scales with the number of connections because it's done in the connection thread. I think that's an intuitive way to scale without adding an additional parameter to configure the amount of zero page skip threads.
By moving to the connection threads, we also don't require all memory to be scanned for zero pages before sending the first byte of memory to the destination. Memory is scanned in chunks as it's passed to the connection threads.

All benchmarks were done between livemig-dellemc-2tb-1 and livemig-dellemc-2tb-2 and run only once per setup, so expect some variation between runs.

Setup Migration time % of no skip
32GiB, 4 vCPU, no memtouch, no skip, 1 connection 18666ms 100%
32GiB, 4vCPU, no memtouch, with skip, 1 connection 2297ms 12%
1TiB, 32vCPU, no memtouch, no skip, 8 connections 90462ms 100%
1TiB, 32vCPU, no memtouch, with skip, 8 connections 7855ms 9%
32GiB, 4 vCPU, with 50% memtouch, no skip, 1 connection 16700ms 100%
32GiB, 4 vCPU, with 50% memtouch, with skip, 1 connection 10991ms 66%
1TiB, 32vCPU, with 50% memtouch, no skip, 8 connections 95710ms 100%
1TiB, 32vCPU, with 50% memtouch, with skip, 8 connections 56691ms 59%
32GiB, 4 vCPU, with 100% memtouch, no skip, 1 connection 18350ms 100%
32GiB, 4 vCPU, with 100% memtouch, with skip, 1 connection 19221ms 105%
1TiB, 32vCPU, with 100% memtouch, no skip, 8 connections 114490ms 100%
1TiB, 32vCPU, with 100% memtouch, with skip, 8 connections 108835ms 95%
32GiB, 4 vCPU, with 100% memtouch, no skip, 2 connection 11604ms 100%
32GiB, 4 vCPU, with 100% memtouch, with skip, 2 connection 9708ms 84%
Benchmark commands
  • 32GiB, 4 vCPU, no memtouch, no skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000
  • 32GiB, 4vCPU, no memtouch, with skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --skip-zero-pages
  • 1TiB, 32vCPU, no memtouch, no skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8
  • 1TiB, 32vCPU, no memtouch, with skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8 --skip-zero-pages
  • 32GiB, 4 vCPU, with 50% memtouch, no skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000
    > memtouch --rw_ratio 100 --thread_mem 4096 --num_threads 4 --once
  • 32GiB, 4 vCPU, with 50% memtouch, with skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --skip-zero-pages
    > memtouch --rw_ratio 100 --thread_mem 4096 --num_threads 4 --once
  • 1TiB, 32vCPU, with 50% memtouch, no skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8
    > memtouch --rw_ratio 100 --thread_mem 16192 --num_threads 32 --once
  • 1TiB, 32vCPU, with 50% memtouch, with skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8 --skip-zero-pages
    > memtouch --rw_ratio 100 --thread_mem 16192 --num_threads 32 --once
  • 32GiB, 4 vCPU, with 100% memtouch, no skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000
    > memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once
  • 32GiB, 4 vCPU, with 100% memtouch, with skip, 1 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --skip-zero-pages
    > memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once
  • 1TiB, 32vCPU, with 100% memtouch, no skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8
    > memtouch --rw_ratio 100 --thread_mem 32512 --num_threads 32 --once
  • 1TiB, 32vCPU, with 100% memtouch, with skip, 8 connections

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8 --skip-zero-pages
    > memtouch --rw_ratio 100 --thread_mem 32512 --num_threads 32 --once
  • 32GiB, 4 vCPU, 100% memtouch, no skip, 2 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 2
    > memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once
  • 32GiB, 4 vCPU, 100% memtouch, with skip, 2 connection

    > cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
    > cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 2 --skip-zero-pages
    > memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once

Copy link
Copy Markdown

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check whether this still works with the live-migration statistics? bytes_to_transmit is calculated from the iteration_table before the zero pages are removed.

I think this could even break the downtime cutoff, because sending a lot of zero pages could lead to a very high bytes_per_sec value.

We should talk to @phip1611 about this.

Comment on lines +541 to +551
// Amount of bytes by which the gpa undershoots the page boundary.
let gpa_page_undershoot = {
// Amount of bytes by which the gpa overshoots the page boundary.
let offset = memory_range.gpa % page_size_u64;
if offset > 0 {
page_size_u64 - offset
} else {
0
}
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take a look whether this really happens? To me it would be super weird if those ranges are not page aligned.

Copy link
Copy Markdown
Author

@arctic-alpaca arctic-alpaca Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obvious example is the MemoryRangeTableIterator, that will create MemoryRanges that aren't page aligned if the chunk size isn't a multiple of the page size (which isn't enforced).

In general, I don't want to introduce a panic for something we can handle instead. Adding a warn level log message might be a good idea though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without knowing all the details right now. How about a debug_assert!? This way, we catch things in libvirt-tests, see https://github.com/cyberus-technology/libvirt-tests/blob/059637c2128db9e4d0a37cf2c34f810ad6ce959b/flake.nix#L70

We don't use cloud hypervisor with debug assertions in the customer, so it should be fine.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just not checking the overshoot and undershoot for being zero filled? We can just cut it off and create a MemoryRange from it. This way, the zero page checking less complex and we don't panic.

Comment on lines +568 to +596
if gpa_page_undershoot != 0
&& !guest_memory_is_equal(
current_gpa,
&ZERO_PAGE[..gpa_page_undershoot as usize],
guest_memory,
)?
{
current_length += gpa_page_undershoot;
}

for page_start in (0..page_amount)
.map(|page_index| page_index * page_size_u64 + first_page_boundary)
{
// If the current page is zero, we push all previous non-zero pages to
// `processed_data` and set `current_gpa` to the end of the zero page while
// resetting the length.
if guest_memory_is_equal(page_start, &ZERO_PAGE, guest_memory)? {
if current_length != 0 {
processed_data.push(MemoryRange {
gpa: current_gpa,
length: current_length,
});
}
current_gpa += current_length + page_size_u64;
current_length = 0;
} else {
current_length += page_size_u64;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume we have a range with:

  • 2kB zeroes,
  • 4kB dirty,
  • 4kB zeroes,

current_gpa = 0x800 (2048) and current_length = 0

Step current_gpa current_length
1 0x800 0
2 0x800 4096
3 0x1800 0

Where

  • step 1: Handling of unaligned memory
  • step 2: Handling of first aligned page
  • step 3: Handling of second aligned page

After step three, you push gpa: 0x800 and length: 4096 into processed_data, which is wrong, because you should push gpa: 0x1000 and length: 4096.

Or am I missing something here?

I think your tests do not cover this case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks for catching that! Will fix and add test(s).

Comment on lines +1335 to +1336
skip-zero-pages:
type: boolean
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be skip_zero_pages to make it match the actual implementation.

Comment on lines +313 to +314
/// Skip zero-filled pages when sending VM memory to the receiver.
pub skip_zero_pages: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that need some #[serde(default)]? I am not sure what happens when you use HTTP instead of ch-remote and the field has no #[serde(default)].-

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having #[serde(default)] is the correct approach 👍

In the `vm_send_memory` call, we're now skipping all pages
completely filled with zeroes. This reduces the memory that needs to be
transferred during migration if the VM has zero pages in its memory.

On-behalf-of: SAP julian.schindel@sap.com
Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com
Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com
Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

# Conflicts:
#	vmm/src/lib.rs
@arctic-alpaca arctic-alpaca marked this pull request as draft March 26, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants