Conversation
4a382a0 to
af4ff1f
Compare
hertrste
left a comment
There was a problem hiding this comment.
some small suggestions but I am fine in general :)
b5ff4e7 to
5ff21a2
Compare
5ff21a2 to
bfb9dca
Compare
418953e to
fc4cc8c
Compare
amphi
left a comment
There was a problem hiding this comment.
Looks good to me overall, I just have some small comments and questions.
tests/testsuite_cpu_profiles.py
Outdated
| forbidden MSRs we check in this test, therefore, consist of MSRs that might be present on the host but not in | ||
| the profile we restrict the VM to. | ||
|
|
||
| Note: Needs to run on an Intel CPU |
There was a problem hiding this comment.
I think we should check that somehow? Maybe not in this PR, but in the future this may become necessary. Do we want to open a ticket for that?
There was a problem hiding this comment.
I agree with you. Currently these test are not run in our CI pipeline and they have to be executed manually. I think ensuring that they run on the right CPUs is part of including them in our regular Ci runs.
tests/testsuite_cpu_profiles.py
Outdated
| for msr_id in msrs_to_check: | ||
| if f"0x{msr_id:08x}" in cmd_output: | ||
| msr_output.append(msr_id) | ||
| forbidden_msr_count += 1 |
There was a problem hiding this comment.
Can you please add some example output and what you do with that? Otherwise it is very hard to verify whether this does the right thing.
There was a problem hiding this comment.
I've added more extensive comment around this code and example output of msr.
olivereanderson
left a comment
There was a problem hiding this comment.
Good work.
I have a few suggestions for improved documents and I think we should also
try to understand the list of "allowed forbidden MSRs"/"BENIGN_FORBIDDEN_MSRS" a bit better before merging this.
| 0x00000000, | ||
| 0x00000001, | ||
| 0x00000017, | ||
| 0x000000C1, | ||
| 0x000000C2, | ||
| 0x00000179, | ||
| 0x00000186, | ||
| 0x00000187, | ||
| 0x00000198, | ||
| 0x00000199, | ||
| 0x000001D9, | ||
| 0x000001DD, | ||
| 0x000001DE, | ||
| 0x00000011, | ||
| 0x00000012, | ||
| 0x0000002A, | ||
| 0x0000002C, | ||
| 0x000000CD, | ||
| 0x0000011E, | ||
| 0x000001DB, | ||
| 0x000001DC, | ||
| 0x00000606, | ||
| 0x00000611, | ||
| 0x00000619, | ||
| 0x00000639, | ||
| 0x00000641, |
There was a problem hiding this comment.
@tpressure it would be good if we could figure out why KVM does not put these in KVM_GET_MSR_INDEX_LIST, but still makes them available to the guest.
If all of the following are true:
- They have default values
- They are always available (as long as an Intel CPU is in use)
- They cannot be modified by the guest
then we should be good.
My testing so far indicates that they are also available to the guest when using CPU models in QEMU, but maybe more care is taken to transfer them during live migration. Currently these will not be transferred by Cloud hypervisor.
There was a problem hiding this comment.
I have added an issue to track this list here https://github.com/cobaltcore-dev/cobaltcore/issues/478
We use the `msr` tool for detecting MSRs. We make use of this in a test added in a followup commit. The `msr` tool comes with it's own kernel module that must be loaded in order for `msr` to work. Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de> On-behalf-of: SAP pascal.scholz@sap.com
34c568d to
c131f0c
Compare
c131f0c to
5dacb48
Compare
amphi
left a comment
There was a problem hiding this comment.
Very very nice! I'll leave the second approval to @olivereanderson , because I am not sure whether his question needs to be answered before merging.
tests/testsuite_cpu_profiles.py
Outdated
| Note: This test needs to run on a relatively recent Intel CPU that | ||
| fulfills the respective requirements for the CPU profiles used in this | ||
| test. |
There was a problem hiding this comment.
can you make this a bit more explicit? What do we need? Sapphire Rapids and newer? Or something else?
There was a problem hiding this comment.
I'll add that the CPU must be able to execute Qemu's Icelake-Server-v7 profile.
In my opinion this is already as explicit as it could be. It doesn't make sense to go with "Sapphire Rapid and newer" because no one can guarantee that there are no feature changes across CPU generations and there are even differences between Sapphire Rapids for Servers and Workstations (e.g. SGX support), for instance. We could restrict this to Xeon Scaleable Gen 4 and higher, but again we don't know if upcoming consumer CPUs will at some point be compatible with the CPU profile. Less do we know what Intel has in mind when it comes to market separation and marketing, which could heavily influence naming and CPU feature set in the respective markte segments. So I think referring to the CPU profile is the best/most explicit requirement we can name at this point. Naming any other restriction will probably not age well.
README.md
Outdated
| - tests that check migrations between hosts with different NUMA configurations | ||
| - `cpu_profiles` | ||
| - tests that run on hosts with different CPU profiles, including migration tests | ||
| - need to run on dedicated hardware |
There was a problem hiding this comment.
What is dedicated hardware? I think you mean Intel CPUs (maybe also some specific generation)? Please be as specific as possible.
There was a problem hiding this comment.
Again, all we can do is note that the CPU to execute these tests must be able to run the respecitve CPU profile.
olivereanderson
left a comment
There was a problem hiding this comment.
Thanks for working on this!
We might want to try this with a sapphire rapids profile as well, but that can easily be done in a follow up PR built on top of the excellent work you did here.
I have opened an issue with regards to the benign forbidden MSRs which we should investigate in more detail when we have more time. @tpressure might want to take a look at that issue already now before approving this PR though.
5dacb48 to
73d8626
Compare
Add a test that checks if any fobidden MSR can be accessed by the guest. Forbidden MSRs are those that are available on the host but not in the respective CPU profile. Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de> On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de> On-behalf-of: SAP pascal.scholz@sap.com
73d8626 to
8a1f209
Compare
|
I'll wait for @tpressure 's okay before hitting the merge button. |
Make sure to ping him directly - a lot of things are going on right now, he might miss a notification from github :) |
This test checks if all forbidden MSRs are inaccessible for the guest. In case we find some MSRs accessible, we first collect them before letting the test fail to gather better debug information.
I run this test on
livemig-dellemc-2tb-1and it succeeded.