chore: Reduce Guest virtual machine RAM to 475GB and use hugepages#8934
Open
frankdavid wants to merge 1 commit intomasterfrom
Open
chore: Reduce Guest virtual machine RAM to 475GB and use hugepages#8934frankdavid wants to merge 1 commit intomasterfrom
frankdavid wants to merge 1 commit intomasterfrom
Conversation
Using hugepages helps the host reduce the work of managing the memory pages of the QEMU process. In my experiment, after allocating 430GB in a pattern with gaps in the guest VM, the QEMU process is no longer stuck after the VM shutdown. This should hopefully remedy the slow QEMU shutdown that we've seen in production on certain nodes. The downside: 1. The Host must allocate the hugepages during startup, this adds ~30 seconds to the Host startup. 2. QEMU allocates hugepages eagerly during startup, this adds ~5 seconds to the startup time of the Guest VM without SEV and ~30 seconds with SEV. However, we save on the QEMU process cleanup time which can be anywhere between 30s and 100s (which above 40s makes libvirt time out and put the Guest domain in a zombie state necessitating a Host reboot). The RAM was reduced from 480GB to 475GB. On SEV nodes, the kernel uses ~15GB for various metadata of the guest memory (this is unrelated to using hugepages). With 480GB (+15GB metadata + the host's own kernel and processes), we were dangerously close to running out of memory.
andrewbattat
approved these changes
Feb 19, 2026
ic-os/components/early-boot/initramfs-tools/hostos/setup-hugepages/setup-hugepages
Show resolved
Hide resolved
ic-os/components/early-boot/initramfs-tools/hostos/setup-hugepages/setup-hugepages
Show resolved
Hide resolved
ic-os/components/early-boot/initramfs-tools/hostos/setup-hugepages/setup-hugepages
Show resolved
Hide resolved
Comment on lines
+19
to
+21
| /// `HUGEPAGE_ALLOCATION_GB` in the `setup-hugepages` script. | ||
| #[cfg(not(feature = "dev"))] | ||
| const DEFAULT_VM_MEMORY_GB: u32 = 480; | ||
| const DEFAULT_VM_MEMORY_GIB: u32 = 475; |
Contributor
There was a problem hiding this comment.
Is it GB or GIB? We're using both
Should HUGEPAGE_ALLOCATION_GB be renamed HUGEPAGE_ALLOCATION_GIB?
Contributor
Author
There was a problem hiding this comment.
Yeah, it's all 2^. For RAM size, GB is usually 2^, for disk it's 10^. But it's very confusing, so i'll use GiB everywhere.
basvandijk
approved these changes
Feb 19, 2026
Bownairo
reviewed
Feb 19, 2026
|
|
||
| # Allocate huge pages during early boot | ||
|
|
||
| set -e |
Contributor
There was a problem hiding this comment.
nit: maybe -Eeuo pipefail, but we don't actually set this in most of our scripts.
Comment on lines
+7
to
+16
| prereqs() { | ||
| echo "" | ||
| } | ||
|
|
||
| case $1 in | ||
| prereqs) | ||
| prereqs | ||
| exit 0 | ||
| ;; | ||
| esac |
| @@ -455,6 +455,6 @@ impl TestEnvAttribute for UnassignedRecordConfig { | |||
| pub fn bare_metal_vm_spec() -> VmSpec { | |||
Contributor
There was a problem hiding this comment.
This isn't actually used, right? We never request bare metal resources from Farm, so maybe vm_spec should be optional until we actually do?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Using hugepages helps the host reduce the work of managing the memory pages of the QEMU process. In my experiment, after allocating 430GB in a pattern with gaps in the guest VM, the QEMU process is no longer stuck after the VM shutdown. This should hopefully remedy the slow QEMU shutdown that we've seen in production on certain nodes.
The downside:
The RAM was reduced from 480GB to 475GB. On SEV nodes, the kernel uses ~15GB for various metadata of the guest memory (this is unrelated to using hugepages). With 480GB (+15GB metadata + the host's own kernel and processes), we were dangerously close to running out of memory. QEMU eagerly takes all the huge pages for itself, so this RAM cannot be used for anything else. By reducing it to only 475 GB, we have enough RAM for other functionalities of the Host.