Add option to skip zero pages during VM migration#112
Add option to skip zero pages during VM migration#112arctic-alpaca wants to merge 4 commits intocyberus-technology:gardenlinuxfrom
Conversation
e6b541c to
b50b666
Compare
I think you forgot to multiply by 100 for the "with skip" entries. |
That's what you get when you try to finish something quickly before a meeting 🤦 Fixed, thanks. |
b50b666 to
622c373
Compare
In the `MemoryRangeTable::partition` call, we're now skipping all pages completely filled with zeroes. This reduces the memory that needs to be transferred during migration if the VM has zero pages in its memory. On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
On-behalf-of: SAP julian.schindel@sap.com Signed-off-by: Julian Schindel <julian.schindel@cyberus-technology.de>
622c373 to
46c65f2
Compare
amphi
left a comment
There was a problem hiding this comment.
Overall great code and thanks for doing the benchmarks! But I have to admit that I have the feeling that all of this is a lot of complexity for the small advantages we get in only some very special scenarios.
Can you maybe take a look whether we could implement that in vm_send_memory (or somewhere near that)? There we already have the guest_memory, and we could do it multi threaded (if we use multiple TCP connections). Or maybe in some other place, but this feels a bit intrusive (again, for the small win we get).
Sorry for being so negative about this change
| // As far as I can tell, `MemoryRange` should always start and end on page boundaries, | ||
| // but there are not type-level guarantees, so we handle page boundaries and overshoot | ||
| // to be safe. |
There was a problem hiding this comment.
I am really unsure whether we want to silently fix those memory regions if they don't start and end on page boundaries. Maybe just make it an debug_assert so we see it in the tests if this assumption is not correct.
There was a problem hiding this comment.
Yup, non-trivial drive-by changes are usually better handled in a dedicated PR! :)
There was a problem hiding this comment.
If there is no documented invariants, I'd be cautious to break on something we can handle. The lengths of MemoryRanges returned by this iterator don't fit page boundaries for example.
To fix this properly, MemoryRange should enforce the page boundaries, not this method.
I'm not sure what the best way to handle this in this PR is.
There was a problem hiding this comment.
Whats the reason to remove this comment?
There was a problem hiding this comment.
Seems a bit broken, you deleted
/// Return the next memory range in the table, making sure that
/// the returned range is not larger than `chunk_size`.
///
/// **Note**: Do not rely on the order of the ranges returned by this
/// iterator. This allows for a more efficient implementation.
Over MemoryRangeTableIterator::next. This comment is what my question is about.
There was a problem hiding this comment.
I moved it to the struct. It's not very visible in the trait method implementation and overrides the existing documentation of the trait method.
Initially I was looking into doing this deeper in the call stack, but was under the impression that iterating over the complete
No worries 😃 |
There was a problem hiding this comment.
Thank you for working on this. This looks quite reasonable to me 👍
I notice that the benchmarks are all done with parameters/data I expect to be edge cases/outliers.
In other words both 0% writes and 100% writes is a bit absurd.
It would be interesting to see memtouch numbers with:
- 2/3 reads (1/3 writes)
- 1/2 reads (1/2 writes)
- 1/3 reads (2/3 writes)
As I expect that to be a better approximation of most realistic workloads.
| /// Removes all-zero-pages from [`MemoryRangeTableIterator::data`] and populates | ||
| /// [`MemoryRangeTableIterator::zero_removed_data`] with the non-zero-pages. | ||
| /// | ||
| /// # Panics | ||
| /// | ||
| /// Panics if a memory range is not valid for [`MemoryRangeTableIterator::guest_memory`]. |
There was a problem hiding this comment.
Please add a line to the documentation explaining what the returned bool means.
| } | ||
|
|
||
| for page_start in | ||
| (0..page_amount).map(|page_index| page_index * page_size_u64 + first_page_boundary) |
There was a problem hiding this comment.
Aside: I would be curious to know whether the compiler inlines this function and replaces the multiplication with a shift (the page size is a power of two).
I agree for 0%, but I'm not sure that 100% is as absurd. This doesn't indicate that 100% of memory are in use currently, rather that every page of the memory has been written to during the lifetime of the VM and not been zeroed again. But that's just my assumption, I have nothing to back this up. |
You are right that calling that absurd might be a bit exaggerated, but I still don't think it will be that common. Feel free to keep the 0% and 100% cases, but it would still be useful to see the numbers with the parameters I suggested. |
|
Redrafting and going to open a new PR where the zero-page scanning happens in |
|
Superseded by #117. |
A VM may have previously unused memory that is still zeroed (or memory zeroed by the guest, but that's more unlikely). This memory doesn't need to be transferred during a migration as the migration destination provides zeroed memory to the VM anyway. This PR adds the option to skip zero pages during migration.
I'll leave this PR as draft until benchmarking was done. Benchmarking is currently blocked on a bug in the timeout logic and will be added as soon as possible. Comments are already welcome.Benchmarking was done with this branch, which is based on #111.
All benchmarks were done between
livemig-dellemc-2tb-1andlivemig-dellemc-2tb-2and run only once per setup.In the
32GiB, 4 vCPU, with memtouch, with skip, 1 connectionand1TiB, 32vCPU, with memtouch, with skip, 8 connectionscases,memtouchgot OOM killed.Benchmark commands
32GiB, 4 vCPU, no memtouch, no skip, 1 connection
32GiB, 4vCPU, no memtouch, with skip, 1 connection
1TiB, 32vCPU, no memtouch, no skip, 8 connections
1TiB, 32vCPU, no memtouch, with skip, 8 connections
32GiB, 4 vCPU, with memtouch, no skip, 1 connection
32GiB, 4 vCPU, with memtouch, with skip, 1 connection
1TiB, 32vCPU, with memtouch, no skip, 8 connections
1TiB, 32vCPU, with memtouch, with skip, 8 connections
They numbers are good for unused machines, but don't look great for machines with full memory utilization. The obvious optimization would to split the zero page checking between multiple threads, but I don't want to make this PR more complex. Since the zero page skipping can be toggled, the optimization should only be applied for newly created or not heavily used VMs. Open to suggestions and opinions though.