Add option to skip zero pages during VM migration#117
Add option to skip zero pages during VM migration#117arctic-alpaca wants to merge 3 commits intocyberus-technology:gardenlinuxfrom
Conversation
7cf2128 to
c76d937
Compare
c76d937 to
429ba34
Compare
amphi
left a comment
There was a problem hiding this comment.
Did you check whether this still works with the live-migration statistics? bytes_to_transmit is calculated from the iteration_table before the zero pages are removed.
I think this could even break the downtime cutoff, because sending a lot of zero pages could lead to a very high bytes_per_sec value.
We should talk to @phip1611 about this.
| // Amount of bytes by which the gpa undershoots the page boundary. | ||
| let gpa_page_undershoot = { | ||
| // Amount of bytes by which the gpa overshoots the page boundary. | ||
| let offset = memory_range.gpa % page_size_u64; | ||
| if offset > 0 { | ||
| page_size_u64 - offset | ||
| } else { | ||
| 0 | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
Did you take a look whether this really happens? To me it would be super weird if those ranges are not page aligned.
There was a problem hiding this comment.
The obvious example is the MemoryRangeTableIterator, that will create MemoryRanges that aren't page aligned if the chunk size isn't a multiple of the page size (which isn't enforced).
In general, I don't want to introduce a panic for something we can handle instead. Adding a warn level log message might be a good idea though.
There was a problem hiding this comment.
Without knowing all the details right now. How about a debug_assert!? This way, we catch things in libvirt-tests, see https://github.com/cyberus-technology/libvirt-tests/blob/059637c2128db9e4d0a37cf2c34f810ad6ce959b/flake.nix#L70
We don't use cloud hypervisor with debug assertions in the customer, so it should be fine.
There was a problem hiding this comment.
How about just not checking the overshoot and undershoot for being zero filled? We can just cut it off and create a MemoryRange from it. This way, the zero page checking less complex and we don't panic.
vm-migration/src/protocol.rs
Outdated
| if gpa_page_undershoot != 0 | ||
| && !guest_memory_is_equal( | ||
| current_gpa, | ||
| &ZERO_PAGE[..gpa_page_undershoot as usize], | ||
| guest_memory, | ||
| )? | ||
| { | ||
| current_length += gpa_page_undershoot; | ||
| } | ||
|
|
||
| for page_start in (0..page_amount) | ||
| .map(|page_index| page_index * page_size_u64 + first_page_boundary) | ||
| { | ||
| // If the current page is zero, we push all previous non-zero pages to | ||
| // `processed_data` and set `current_gpa` to the end of the zero page while | ||
| // resetting the length. | ||
| if guest_memory_is_equal(page_start, &ZERO_PAGE, guest_memory)? { | ||
| if current_length != 0 { | ||
| processed_data.push(MemoryRange { | ||
| gpa: current_gpa, | ||
| length: current_length, | ||
| }); | ||
| } | ||
| current_gpa += current_length + page_size_u64; | ||
| current_length = 0; | ||
| } else { | ||
| current_length += page_size_u64; | ||
| } | ||
| } |
There was a problem hiding this comment.
Let's assume we have a range with:
- 2kB zeroes,
- 4kB dirty,
- 4kB zeroes,
current_gpa = 0x800 (2048) and current_length = 0
| Step | current_gpa |
current_length |
|---|---|---|
| 1 | 0x800 |
0 |
| 2 | 0x800 |
4096 |
| 3 | 0x1800 |
0 |
Where
- step 1: Handling of unaligned memory
- step 2: Handling of first aligned page
- step 3: Handling of second aligned page
After step three, you push gpa: 0x800 and length: 4096 into processed_data, which is wrong, because you should push gpa: 0x1000 and length: 4096.
Or am I missing something here?
I think your tests do not cover this case.
There was a problem hiding this comment.
You're right, thanks for catching that! Will fix and add test(s).
| skip-zero-pages: | ||
| type: boolean |
There was a problem hiding this comment.
I think this should be skip_zero_pages to make it match the actual implementation.
| /// Skip zero-filled pages when sending VM memory to the receiver. | ||
| pub skip_zero_pages: bool, |
There was a problem hiding this comment.
Does that need some #[serde(default)]? I am not sure what happens when you use HTTP instead of ch-remote and the field has no #[serde(default)].-
There was a problem hiding this comment.
I think having #[serde(default)] is the correct approach 👍
429ba34 to
5ac4b1e
Compare
In the `vm_send_memory` call, we're now skipping all pages completely filled with zeroes. This reduces the memory that needs to be transferred during migration if the VM has zero pages in its memory. On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
5ac4b1e to
4e05eda
Compare
On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de> # Conflicts: # vmm/src/lib.rs
4e05eda to
208c259
Compare
Important
Drafted for now until we have a clearer picture how to incorporate this into the stats tracking.
Improvement of #112.
A VM may have previously unused memory that is still zeroed (or memory zeroed by the guest, but that's more unlikely). This memory doesn't need to be transferred during a migration as the migration destination provides zeroed memory to the VM anyway. This PR adds the option to skip zero pages during migration.
The zero page skipping now scales with the number of connections because it's done in the connection thread. I think that's an intuitive way to scale without adding an additional parameter to configure the amount of zero page skip threads.
By moving to the connection threads, we also don't require all memory to be scanned for zero pages before sending the first byte of memory to the destination. Memory is scanned in chunks as it's passed to the connection threads.
All benchmarks were done between
livemig-dellemc-2tb-1andlivemig-dellemc-2tb-2and run only once per setup, so expect some variation between runs.Benchmark commands
32GiB, 4 vCPU, no memtouch, no skip, 1 connection
32GiB, 4vCPU, no memtouch, with skip, 1 connection
1TiB, 32vCPU, no memtouch, no skip, 8 connections
1TiB, 32vCPU, no memtouch, with skip, 8 connections
32GiB, 4 vCPU, with 50% memtouch, no skip, 1 connection
32GiB, 4 vCPU, with 50% memtouch, with skip, 1 connection
1TiB, 32vCPU, with 50% memtouch, no skip, 8 connections
1TiB, 32vCPU, with 50% memtouch, with skip, 8 connections
32GiB, 4 vCPU, with 100% memtouch, no skip, 1 connection
32GiB, 4 vCPU, with 100% memtouch, with skip, 1 connection
1TiB, 32vCPU, with 100% memtouch, no skip, 8 connections
1TiB, 32vCPU, with 100% memtouch, with skip, 8 connections
32GiB, 4 vCPU, 100% memtouch, no skip, 2 connection
32GiB, 4 vCPU, 100% memtouch, with skip, 2 connection