Skip to content

Add version negotiation for migration protocol#114

Open
Coffeeri wants to merge 3 commits intocyberus-technology:gardenlinuxfrom
Coffeeri:feat/migration-proto-versioning
Open

Add version negotiation for migration protocol#114
Coffeeri wants to merge 3 commits intocyberus-technology:gardenlinuxfrom
Coffeeri:feat/migration-proto-versioning

Conversation

@Coffeeri
Copy link
Copy Markdown

@Coffeeri Coffeeri commented Mar 19, 2026

This PR implements https://github.com/cobaltcore-dev/cobaltcore/issues/464 and introduces version negotiation for the migration protocol.

Steps before Merging

  • Add testing: migration between two CHV versions (one with version negotiation and one without)

@Coffeeri Coffeeri marked this pull request as draft March 19, 2026 09:20
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 2 times, most recently from 32cd5c1 to 4d7571b Compare March 19, 2026 09:26
@Coffeeri Coffeeri self-assigned this Mar 19, 2026
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 9 times, most recently from 333946d to 4777340 Compare March 20, 2026 09:22
@phip1611 phip1611 self-requested a review March 23, 2026 10:20
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 3 times, most recently from 1e44c2e to 97ead4b Compare March 23, 2026 11:30
@hertrste
Copy link
Copy Markdown
Collaborator

A second requirement is that this PR must not introduce a breaking protocol change by itself. To achieve that, it reuses the existing 6 padding/ reserved bytes in the START request and OK response at the beginning of the migration handshake instead of adding a new message type.

Would it be possible to 1.) change CHV so it ignores new fields in Commands that it does not yet know about and gracefully handle missing fields in responses and 2.) add new fields that encode the supported versions?

This way, we could do changes to the migration protocol later easier, as long as they don't change the semantics of existing fields. We also shouldn't be breaking then (if we do it in those 2 steps and do not skip versions when upgrading).

Not sure if really required but at least have a short discussion about it.

Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! Left some remarks

@arctic-alpaca
Copy link
Copy Markdown

Would it be possible to 1.) change CHV so it ignores new fields in Commands that it does not yet know about and gracefully handle missing fields in responses and 2.) add new fields that encode the supported versions?

CHV doesn't know anything about the bytes it reads from the network. The only way to go from bytes to struct is to apply some known conversion, which basically means treating a predefined amount of bytes as type T. CHV can ignore some defined amount of unused space (like the padding that @Coffeeri is using in this PR).
Ignoring unknown fields can, as far as I know, only be done by using a completely different data format like JSON or any other self describing data format.

If either side does not implement version negotiation yet, the handshake falls back to the legacy v0 behavior.

Isn't that just treating the current implementation as version 0 and the negotiation just works/fails depending on whether both sides support version 0?

@Coffeeri I think the general design of the handshake is good. Please assign me as one of the reviewers when code reviews are in order.

@Coffeeri
Copy link
Copy Markdown
Author

Coffeeri commented Mar 24, 2026

Would it be possible to 1.) change CHV so it ignores new fields in Commands that it does not yet know about and gracefully handle missing fields in responses and 2.) add new fields that encode the supported versions?

This way, we could do changes to the migration protocol later easier, as long as they don't change the semantics of existing fields. We also shouldn't be breaking then (if we do it in those 2 steps and do not skip versions when upgrading).

@hertrste
I think this would only be possible when using a serialized data type as @arctic-alpaca described. The C-style struct of Request/Response limits us here.

Isn't that just treating the current implementation as version 0 and the negotiation just works/fails depending on whether both sides support version 0?

Yes, exactly. We should handle v0 (impl without version negation) as a separate version, but we don't have to support it in the future if we have a migration path via an intermediate migration protocol version.
But in some prior iterations of mine, I had made the v0 implementation explicit, such as explicitly sending the legacy start request/ok response. We can simplify this and don't need to branch this out, as these fields are ignored in the worst case.


Does the overall design suite the others as well?
I think we have settled on:

  • exploit the prior six byte padding for version transport in START/ first OK
  • selected version must be negotiated to the highest common version

Open questions to the discussion:

  • How will we use the selected version: Will we simply plumb the selected version through our codebase and branch on selected version? I think having complete separate crates for three versions [prior, current, nightly], as suggested by @arctic-alpaca, would be very difficult as we don't have a clear migration logic separation, much of which lives in the vmm crate.
  • How should we define our current and prior versions as an elegant data structure? I agree with @phip1611 that the following versioning and aliases could be improved:
impl MigrationProtocolVersion {
    pub const LEGACY: Self = Self::V0;
    pub const CURRENT: Self = Self::V1;

We should be careful not to model this as something like our planned 3-version support [prior, current, nightly], because protocol versions need stable, explicit identities per release. Each concrete protocol revision should keep its own defined version number, rather than being derived from moving labels. I'd like it to be explicit.

Copy link
Copy Markdown

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the sender have a "minimum supported version"? I think the sender should just send using the newest version it supports, and the receiver has to decide whether it can support that version.

Otherwise, we'd give the sender the ability to send using an older protocol than it supports. In which case would that be helpful?

@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 5 times, most recently from 7b5f772 to f21e912 Compare March 25, 2026 11:55
@Coffeeri Coffeeri requested a review from phip1611 March 25, 2026 12:03
@Coffeeri Coffeeri changed the title WIP: add version negotiation for migration protocol Add version negotiation for migration protocol Mar 25, 2026
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 5 times, most recently from ecb08f6 to daedb5d Compare March 25, 2026 16:28
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch 5 times, most recently from 2149a1a to 5561c41 Compare March 26, 2026 08:22
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch from 5561c41 to d36d30a Compare March 26, 2026 10:36
Add protocol-side support for migration protocol versioning.

Use the existing 6-byte Start command header, which was
previously zero padding, to carry the sender's migration
protocol version without changing the wire layout.

Store the version as a little-endian u16 in the first two
bytes and ignore the remaining four bytes. A zeroed command
header continues to mean a legacy v0 sender.

This keeps the message flow unchanged for rollout:
Start is still followed by plain OK or Error (Aborted), and no new
command is needed.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Validate the sender's migration protocol version when
handling the initial Start request.

Read the version from the Start command header, accept only
the supported version window n-1..=n, and reject unsupported
versions with Error. A rejected Start moves the receiver to
the aborted state.

This keeps compatibility one-way, from older protocol
versions to newer ones, and leaves later version-based
branching on the receiver side.

Log the protocol version on both sender and receiver to make
the active migration path visible.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Print the supported vm-migration protocol version range in
cloud-hypervisor --version as an extra line:

  vm-migration protocol versions v0-v1

This makes the currently supported compatibility window
visible without having to inspect the migration code.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the feat/migration-proto-versioning branch from d36d30a to d22e787 Compare March 26, 2026 10:54
@Coffeeri Coffeeri marked this pull request as ready for review March 26, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants