Add option to skip zero pages during VM migration by arctic-alpaca · Pull Request #112 · cyberus-technology/cloud-hypervisor

arctic-alpaca · 2026-03-18T08:20:46Z

A VM may have previously unused memory that is still zeroed (or memory zeroed by the guest, but that's more unlikely). This memory doesn't need to be transferred during a migration as the migration destination provides zeroed memory to the VM anyway. This PR adds the option to skip zero pages during migration.

~~I'll leave this PR as draft until benchmarking was done. Benchmarking is currently blocked on a bug in the timeout logic and will be added as soon as possible. Comments are already welcome.~~

Benchmarking was done with this branch, which is based on #111.

All benchmarks were done between livemig-dellemc-2tb-1 and livemig-dellemc-2tb-2 and run only once per setup.

Setup	Migration time	% of no skip
32GiB, 4 vCPU, no memtouch, no skip, 1 connection	47083ms	100%
32GiB, 4vCPU, no memtouch, with skip, 1 connection	2245ms	5%
1TiB, 32vCPU, no memtouch, no skip, 8 connections	90250ms	100%
1TiB, 32vCPU, no memtouch, with skip, 8 connections	44176ms	49%
32GiB, 4 vCPU, with memtouch, no skip, 1 connection	47634ms	100%
32GiB, 4 vCPU, with memtouch, with skip, 1 connection	50437ms	106%
1TiB, 32vCPU, with memtouch, no skip, 8 connections	101556ms	100%
1TiB, 32vCPU, with memtouch, with skip, 8 connections	179236ms	176%

In the 32GiB, 4 vCPU, with memtouch, with skip, 1 connection and 1TiB, 32vCPU, with memtouch, with skip, 8 connections cases, memtouch got OOM killed.

Benchmark commands

32GiB, 4 vCPU, no memtouch, no skip, 1 connection

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000

32GiB, 4vCPU, no memtouch, with skip, 1 connection

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --skip-zero-pages

1TiB, 32vCPU, no memtouch, no skip, 8 connections

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8

1TiB, 32vCPU, no memtouch, with skip, 8 connections

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8 --skip-zero-pages

32GiB, 4 vCPU, with memtouch, no skip, 1 connection

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000
> memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once

32GiB, 4 vCPU, with memtouch, with skip, 1 connection

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=32G --cpus boot=4
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --skip-zero-pages
> memtouch --rw_ratio 100 --thread_mem 8128 --num_threads 4 --once

1TiB, 32vCPU, with memtouch, no skip, 8 connections

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8
> memtouch --rw_ratio 100 --thread_mem 32512 --num_threads 32 --once

1TiB, 32vCPU, with memtouch, with skip, 8 connections

> cargo run --release --bin cloud-hypervisor -- --api-socket /tmp/jschindel_chv1.sock --kernel result/linux_6_19.bzImage --cmdline "console=ttyS0" --serial tty --console off --initramfs result/initrd_default --seccomp log -vv --memory size=1024G --cpus boot=32
> cargo run --release --bin ch-remote -- --api-socket /tmp/jschindel_chv1.sock send-migration tcp:192.168.123.2:7868 --downtime 200 --migration-timeout 12000 --connections 8 --skip-zero-pages
> memtouch --rw_ratio 100 --thread_mem 32512 --num_threads 32 --once

They numbers are good for unused machines, but don't look great for machines with full memory utilization. The obvious optimization would to split the zero page checking between multiple threads, but I don't want to make this PR more complex. Since the zero page skipping can be toggled, the optimization should only be applied for newly created or not heavily used VMs. Open to suggestions and opinions though.

olivereanderson · 2026-03-18T15:22:41Z

Setup Migration time % of no skip
32GiB, 4 vCPU, no memtouch, no skip, 1 connection 47083ms 100%
32GiB, 4vCPU, no memtouch, with skip, 1 connection 2245ms 0,05%
1TiB, 32vCPU, no memtouch, no skip, 8 connections 90250ms 100%
1TiB, 32vCPU, no memtouch, with skip, 8 connections 44176ms 0,49%
32GiB, 4 vCPU, with memtouch, no skip, 1 connection 47634ms 100%
32GiB, 4 vCPU, with memtouch, with skip, 1 connection 50437ms 106%
1TiB, 32vCPU, with memtouch, no skip, 8 connections 101556ms 100%
1TiB, 32vCPU, with memtouch, with skip, 8 connections 179236ms 176%

I think you forgot to multiply by 100 for the "with skip" entries.

arctic-alpaca · 2026-03-18T15:45:41Z

I think you forgot to multiply by 100 for the "with skip" entries.

That's what you get when you try to finish something quickly before a meeting 🤦 Fixed, thanks.

vm-migration/src/protocol.rs

In the `MemoryRangeTable::partition` call, we're now skipping all pages completely filled with zeroes. This reduces the memory that needs to be transferred during migration if the VM has zero pages in its memory. On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

amphi

Overall great code and thanks for doing the benchmarks! But I have to admit that I have the feeling that all of this is a lot of complexity for the small advantages we get in only some very special scenarios.

Can you maybe take a look whether we could implement that in vm_send_memory (or somewhere near that)? There we already have the guest_memory, and we could do it multi threaded (if we use multiple TCP connections). Or maybe in some other place, but this feels a bit intrusive (again, for the small win we get).

Sorry for being so negative about this change

amphi · 2026-03-19T08:55:47Z

vm-migration/src/protocol.rs

+            // As far as I can tell, `MemoryRange` should always start and end on page boundaries,
+            // but there are not type-level guarantees, so we handle page boundaries and overshoot
+            // to be safe.


I am really unsure whether we want to silently fix those memory regions if they don't start and end on page boundaries. Maybe just make it an debug_assert so we see it in the tests if this assumption is not correct.

Yup, non-trivial drive-by changes are usually better handled in a dedicated PR! :)

If there is no documented invariants, I'd be cautious to break on something we can handle. The lengths of MemoryRanges returned by this iterator don't fit page boundaries for example.

To fix this properly, MemoryRange should enforce the page boundaries, not this method.

I'm not sure what the best way to handle this in this PR is.

amphi · 2026-03-19T08:59:54Z

vm-migration/src/protocol.rs

Whats the reason to remove this comment?

Seems a bit broken, you deleted

/// Return the next memory range in the table, making sure that /// the returned range is not larger than `chunk_size`. /// /// **Note**: Do not rely on the order of the ranges returned by this /// iterator. This allows for a more efficient implementation.

Over MemoryRangeTableIterator::next. This comment is what my question is about.

I moved it to the struct. It's not very visible in the trait method implementation and overrides the existing documentation of the trait method.

arctic-alpaca · 2026-03-19T15:49:28Z

Can you maybe take a look whether we could implement that in vm_send_memory (or somewhere near that)? There we already have the guest_memory, and we could do it multi threaded (if we use multiple TCP connections). Or maybe in some other place, but this feels a bit intrusive (again, for the small win we get).

Initially I was looking into doing this deeper in the call stack, but was under the impression that iterating over the complete MemoryRangeTable multiple times (after the MemoryRangeTable::partition call) would be problematic. With the benchmarks, I can now look properly into the tradeoffs, will do so 👍

Sorry for being so negative about this change

No worries 😃

olivereanderson

Thank you for working on this. This looks quite reasonable to me 👍

I notice that the benchmarks are all done with parameters/data I expect to be edge cases/outliers.
In other words both 0% writes and 100% writes is a bit absurd.

It would be interesting to see memtouch numbers with:

2/3 reads (1/3 writes)
1/2 reads (1/2 writes)
1/3 reads (2/3 writes)

As I expect that to be a better approximation of most realistic workloads.

olivereanderson · 2026-03-19T14:19:30Z

vm-migration/src/protocol.rs

+    /// Removes all-zero-pages from [`MemoryRangeTableIterator::data`] and populates
+    /// [`MemoryRangeTableIterator::zero_removed_data`] with the non-zero-pages.
+    ///
+    /// # Panics
+    ///
+    /// Panics if a memory range is not valid for [`MemoryRangeTableIterator::guest_memory`].


Please add a line to the documentation explaining what the returned bool means.

olivereanderson · 2026-03-19T16:12:02Z

vm-migration/src/protocol.rs

+            }
+
+            for page_start in
+                (0..page_amount).map(|page_index| page_index * page_size_u64 + first_page_boundary)


Aside: I would be curious to know whether the compiler inlines this function and replaces the multiplication with a shift (the page size is a power of two).

arctic-alpaca · 2026-03-19T16:23:39Z

In other words both 0% writes and 100% writes is a bit absurd.

I agree for 0%, but I'm not sure that 100% is as absurd. This doesn't indicate that 100% of memory are in use currently, rather that every page of the memory has been written to during the lifetime of the VM and not been zeroed again. But that's just my assumption, I have nothing to back this up.

olivereanderson · 2026-03-20T07:31:36Z

In other words both 0% writes and 100% writes is a bit absurd.

I agree for 0%, but I'm not sure that 100% is as absurd. This doesn't indicate that 100% of memory are in use currently, rather that every page of the memory has been written to during the lifetime of the VM and not been zeroed again.

You are right that calling that absurd might be a bit exaggerated, but I still don't think it will be that common.

Feel free to keep the 0% and 100% cases, but it would still be useful to see the numbers with the parameters I suggested.

arctic-alpaca · 2026-03-20T08:55:39Z

Redrafting and going to open a new PR where the zero-page scanning happens in vm_send_memory. I need a bit of time for benchmarking and more in-depth testing, but initial numbers look promising.

arctic-alpaca · 2026-03-23T08:17:04Z

Superseded by #117.

arctic-alpaca force-pushed the omit-zero-pages-pr branch 3 times, most recently from e6b541c to b50b666 Compare March 18, 2026 09:52

arctic-alpaca marked this pull request as ready for review March 18, 2026 13:38

arctic-alpaca requested review from Coffeeri, amphi, olivereanderson and phip1611 March 18, 2026 13:38

olivereanderson reviewed Mar 18, 2026

View reviewed changes

vm-migration/src/protocol.rs Show resolved Hide resolved

olivereanderson reviewed Mar 18, 2026

View reviewed changes

vm-migration/src/protocol.rs Show resolved Hide resolved

olivereanderson reviewed Mar 18, 2026

View reviewed changes

vm-migration/src/protocol.rs Show resolved Hide resolved

arctic-alpaca force-pushed the omit-zero-pages-pr branch from b50b666 to 622c373 Compare March 19, 2026 07:25

arctic-alpaca added 4 commits March 19, 2026 09:00

vm-migration: add unittests for zero page skipping

e6b2804

On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

vm-migration: log total migration time

7f596a1

On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

ch-remote: add skip zero pages option to API/CLI

46c65f2

On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>

arctic-alpaca force-pushed the omit-zero-pages-pr branch from 622c373 to 46c65f2 Compare March 19, 2026 08:00

amphi reviewed Mar 19, 2026

View reviewed changes

olivereanderson reviewed Mar 19, 2026

View reviewed changes

arctic-alpaca marked this pull request as draft March 20, 2026 08:54

phip1611 mentioned this pull request Mar 22, 2026

Live Migration Feature Tracking and Collaboration cloud-hypervisor/cloud-hypervisor#7111

Open

arctic-alpaca mentioned this pull request Mar 23, 2026

Add option to skip zero pages during VM migration #117

Draft

arctic-alpaca closed this Mar 23, 2026

Conversation

arctic-alpaca commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olivereanderson commented Mar 18, 2026

Uh oh!

arctic-alpaca commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amphi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arctic-alpaca commented Mar 19, 2026

Uh oh!

olivereanderson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arctic-alpaca commented Mar 19, 2026

Uh oh!

olivereanderson commented Mar 20, 2026

Uh oh!

arctic-alpaca commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arctic-alpaca commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arctic-alpaca commented Mar 18, 2026 •

edited

Loading

olivereanderson left a comment •

edited

Loading

arctic-alpaca commented Mar 20, 2026 •

edited

Loading