Skip to content

Msr aware cpu profiles test#180

Open
scholzp wants to merge 3 commits intocyberus-technology:mainfrom
scholzp:msr_aware_cpu_profiles
Open

Msr aware cpu profiles test#180
scholzp wants to merge 3 commits intocyberus-technology:mainfrom
scholzp:msr_aware_cpu_profiles

Conversation

@scholzp
Copy link
Copy Markdown
Contributor

@scholzp scholzp commented Mar 13, 2026

This test checks if all forbidden MSRs are inaccessible for the guest. In case we find some MSRs accessible, we first collect them before letting the test fail to gather better debug information.

I run this test on livemig-dellemc-2tb-1 and it succeeded.

@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch 3 times, most recently from 4a382a0 to af4ff1f Compare March 17, 2026 13:46
@scholzp scholzp marked this pull request as ready for review March 17, 2026 13:59
@scholzp scholzp self-assigned this Mar 17, 2026
Copy link
Copy Markdown
Collaborator

@hertrste hertrste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small suggestions but I am fine in general :)

@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch 2 times, most recently from b5ff4e7 to 5ff21a2 Compare March 18, 2026 16:03
Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, LGTM!

@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch from 5ff21a2 to bfb9dca Compare March 18, 2026 16:53
@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch 2 times, most recently from 418953e to fc4cc8c Compare March 19, 2026 07:23
Copy link
Copy Markdown
Contributor

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, I just have some small comments and questions.

forbidden MSRs we check in this test, therefore, consist of MSRs that might be present on the host but not in
the profile we restrict the VM to.

Note: Needs to run on an Intel CPU
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check that somehow? Maybe not in this PR, but in the future this may become necessary. Do we want to open a ticket for that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. Currently these test are not run in our CI pipeline and they have to be executed manually. I think ensuring that they run on the right CPUs is part of including them in our regular Ci runs.

Comment on lines +124 to +127
for msr_id in msrs_to_check:
if f"0x{msr_id:08x}" in cmd_output:
msr_output.append(msr_id)
forbidden_msr_count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some example output and what you do with that? Otherwise it is very hard to verify whether this does the right thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more extensive comment around this code and example output of msr.

Copy link
Copy Markdown
Contributor

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work.

I have a few suggestions for improved documents and I think we should also
try to understand the list of "allowed forbidden MSRs"/"BENIGN_FORBIDDEN_MSRS" a bit better before merging this.

Comment on lines +1573 to +1598
0x00000000,
0x00000001,
0x00000017,
0x000000C1,
0x000000C2,
0x00000179,
0x00000186,
0x00000187,
0x00000198,
0x00000199,
0x000001D9,
0x000001DD,
0x000001DE,
0x00000011,
0x00000012,
0x0000002A,
0x0000002C,
0x000000CD,
0x0000011E,
0x000001DB,
0x000001DC,
0x00000606,
0x00000611,
0x00000619,
0x00000639,
0x00000641,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpressure it would be good if we could figure out why KVM does not put these in KVM_GET_MSR_INDEX_LIST, but still makes them available to the guest.

If all of the following are true:

  1. They have default values
  2. They are always available (as long as an Intel CPU is in use)
  3. They cannot be modified by the guest

then we should be good.

My testing so far indicates that they are also available to the guest when using CPU models in QEMU, but maybe more care is taken to transfer them during live migration. Currently these will not be transferred by Cloud hypervisor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an issue to track this list here https://github.com/cobaltcore-dev/cobaltcore/issues/478

We use the `msr` tool for detecting MSRs. We make use of this in a
test added in a followup commit.
The `msr` tool comes with it's own kernel module that must be loaded
in order for `msr` to work.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch from 34c568d to c131f0c Compare March 24, 2026 09:56
@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch from c131f0c to 5dacb48 Compare March 24, 2026 10:12
Copy link
Copy Markdown
Contributor

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice! I'll leave the second approval to @olivereanderson , because I am not sure whether his question needs to be answered before merging.

Comment on lines +107 to +109
Note: This test needs to run on a relatively recent Intel CPU that
fulfills the respective requirements for the CPU profiles used in this
test.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a bit more explicit? What do we need? Sapphire Rapids and newer? Or something else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that the CPU must be able to execute Qemu's Icelake-Server-v7 profile.

In my opinion this is already as explicit as it could be. It doesn't make sense to go with "Sapphire Rapid and newer" because no one can guarantee that there are no feature changes across CPU generations and there are even differences between Sapphire Rapids for Servers and Workstations (e.g. SGX support), for instance. We could restrict this to Xeon Scaleable Gen 4 and higher, but again we don't know if upcoming consumer CPUs will at some point be compatible with the CPU profile. Less do we know what Intel has in mind when it comes to market separation and marketing, which could heavily influence naming and CPU feature set in the respective markte segments. So I think referring to the CPU profile is the best/most explicit requirement we can name at this point. Naming any other restriction will probably not age well.

README.md Outdated
- tests that check migrations between hosts with different NUMA configurations
- `cpu_profiles`
- tests that run on hosts with different CPU profiles, including migration tests
- need to run on dedicated hardware
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is dedicated hardware? I think you mean Intel CPUs (maybe also some specific generation)? Please be as specific as possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, all we can do is note that the CPU to execute these tests must be able to run the respecitve CPU profile.

Copy link
Copy Markdown
Contributor

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

We might want to try this with a sapphire rapids profile as well, but that can easily be done in a follow up PR built on top of the excellent work you did here.

I have opened an issue with regards to the benign forbidden MSRs which we should investigate in more detail when we have more time. @tpressure might want to take a look at that issue already now before approving this PR though.

@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch from 5dacb48 to 73d8626 Compare March 25, 2026 08:20
scholzp added 2 commits March 25, 2026 15:33
Add a test that checks if any fobidden MSR can be accessed by the
guest.

Forbidden MSRs are those that are available on the host but not
in the respective CPU profile.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
@scholzp scholzp force-pushed the msr_aware_cpu_profiles branch from 73d8626 to 8a1f209 Compare March 25, 2026 14:33
@scholzp
Copy link
Copy Markdown
Contributor Author

scholzp commented Mar 25, 2026

I'll wait for @tpressure 's okay before hitting the merge button.

@phip1611
Copy link
Copy Markdown
Member

I'll wait for @tpressure 's okay before hitting the merge button.

Make sure to ping him directly - a lot of things are going on right now, he might miss a notification from github :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants