Enhancement: Add drift detection and automatic reconciliation by eshulman2 · Pull Request #668 · k-orc/openstack-resource-controller

eshulman2 · 2026-02-03T16:04:33Z

Proposal for drift detection feature.

mandre

What part of the code needs changing? I expect we detail how shouldReconcile changes.

enhancements/drift-detection.md

mandre · 2026-03-11T09:54:34Z

enhancements/drift-detection.md

+
+1. On the next reconciliation, ORC attempts to fetch the resource by the ID stored in `status.id`
+2. If not found and the resource was originally created by ORC (not imported), ORC recreates it
+3. The new resource ID is stored in `status.id`


My question wasn't really about the obvious case where you'd delete out of band a resource for which you have a weak dependency, but for where we had hard dependency. An example with Subnet -> Network would be more telling. What happens if a network is re-created? Will the subnet be recreated as well?

Inconsistent states can happen, and will happen. Someone who force-deleted a resource, a bug in OpenStack, an operator who made changes to the database directly... We should explain what we would do whenever we will have that case.

This probably deserves a separate section.

enhancements/drift-detection.md

Proposal for drift detection feature.

mdbooth

A few comments, but looks very reasonable to me.

mdbooth · 2026-03-19T16:08:28Z

enhancements/drift-detection.md

+
+This ensures the controller automatically triggers reconciliation after the configured period. Controller-runtime's work queue handles deduplication of reconcile requests, so no additional time-based checks are needed in `shouldReconcile`.
+
+**Resources in terminal error are not resynced**: When a resource is in a terminal error state (e.g., invalid configuration, unrecoverable OpenStack error), periodic resync is not scheduled. Terminal errors indicate issues that cannot be resolved through automatic retry and require manual intervention to fix the underlying problem. This prevents wasted reconciliation cycles on resources that are known to be in an unrecoverable state.


My memory definitely needs a refresh, but do we sometimes set a terminal state due to an issue that can't be resolved automatically but might be resolved manually by an administrator? e.g. A resource is in ERROR state which is later cleared after admin fixes some underlying issue. If so, it might be worth including terminal states in any resync process.

mdbooth · 2026-03-19T16:17:12Z

enhancements/drift-detection.md

+
+1. On the next reconciliation, ORC attempts to fetch the resource by the ID stored in `status.id`
+2. If not found and the resource was originally created by ORC (not imported), ORC recreates it
+3. The new resource ID is stored in `status.id`


I suspect this will need careful thought per resource. In the specific case of network and subnet, IIRC a subnet cannot exist without a network, so if the network has been deleted then the subnet has also been deleted. From this POV, 'hard' dependencies may actually be the easy case.

mdbooth · 2026-03-19T16:20:41Z

enhancements/drift-detection.md

+
+#### Implementation Details
+
+At the end of a successful reconciliation (when no other reschedule is pending), the controller schedules the next resync:


What do you want to do on controller restart?

mdbooth · 2026-03-19T16:24:16Z

enhancements/drift-detection.md

+
+For **imported resources** that are deleted externally, this is always a terminal error regardless of drift detection settings, because the resource was not created by ORC and recreating it would not restore the original resource.
+
+**Note on dependent resources**: OpenStack enforces referential integrity for most resources (e.g., Networks cannot be deleted while Subnets exist). If resources are deleted through means that bypass these checks (direct database manipulation, OpenStack bugs), drift detection preserves ORC's existing reconciliation behavior:


I think ORC's behaviour is undefined in the presence of bugs and direct DB manipulation, tbh. I'm not sure how you could define it.

mdbooth · 2026-03-19T16:27:35Z

enhancements/drift-detection.md

+**Risk**: Multiple controllers or systems may be managing the same OpenStack resources, leading to conflicts where changes are repeatedly overwritten.
+
+**Mitigation**:
+- Document that ORC should be the sole manager of resources it creates


I think we already do this?

mdbooth · 2026-03-19T16:29:01Z

enhancements/drift-detection.md

+
+**Mitigation**:
+- Disabled by default; when enabled, recommend conservative intervals (e.g., 10 hours)
+- Add random jitter to resync times to avoid thundering herd: since reconciliation already uses "requeue after X duration", jitter simply adds a random offset (e.g., ±10%) to the resync period, spreading resyncs over time rather than having them fire simultaneously


Unless it all gets reset when the controller restarts. You might want to store a last synced time in the status? Or some other solution.

github-actions bot added the semver:patch No API change label Feb 3, 2026

This was referenced Feb 4, 2026

Never retry on conflicts - Quota exceeded #667

Open

Drift detection #655

Open

mandre reviewed Feb 16, 2026

View reviewed changes

eshulman2 force-pushed the drift-d-proposal branch from 3134799 to a9f9abf Compare February 17, 2026 14:05

eshulman2 requested a review from mandre February 17, 2026 14:05

mandre reviewed Mar 11, 2026

View reviewed changes

eshulman2 force-pushed the drift-d-proposal branch from a9f9abf to eda8a6f Compare March 11, 2026 14:09

eshulman2 requested a review from mandre March 11, 2026 14:22

mandre reviewed Mar 12, 2026

View reviewed changes

enhancements/drift-detection.md Outdated Show resolved Hide resolved

eshulman2 force-pushed the drift-d-proposal branch from eda8a6f to c610a4f Compare March 15, 2026 08:47

eshulman2 requested a review from mandre March 16, 2026 11:13

mandre reviewed Mar 19, 2026

View reviewed changes

enhancements/drift-detection.md Outdated Show resolved Hide resolved

enhancements/drift-detection.md Outdated Show resolved Hide resolved

eshulman2 force-pushed the drift-d-proposal branch from c610a4f to 0fa5f18 Compare March 19, 2026 10:22

eshulman2 requested a review from mandre March 19, 2026 10:25

Enhancement: Add drift detection and automatic reconciliation

ab42db9

Proposal for drift detection feature.

eshulman2 force-pushed the drift-d-proposal branch from 0fa5f18 to ab42db9 Compare March 19, 2026 10:50

mdbooth reviewed Mar 19, 2026

View reviewed changes


		This ensures the controller automatically triggers reconciliation after the configured period. Controller-runtime's work queue handles deduplication of reconcile requests, so no additional time-based checks are needed in `shouldReconcile`.

		Resources in terminal error are not resynced: When a resource is in a terminal error state (e.g., invalid configuration, unrecoverable OpenStack error), periodic resync is not scheduled. Terminal errors indicate issues that cannot be resolved through automatic retry and require manual intervention to fix the underlying problem. This prevents wasted reconciliation cycles on resources that are known to be in an unrecoverable state.


		#### Implementation Details

		At the end of a successful reconciliation (when no other reschedule is pending), the controller schedules the next resync:


		For imported resources that are deleted externally, this is always a terminal error regardless of drift detection settings, because the resource was not created by ORC and recreating it would not restore the original resource.

		Note on dependent resources: OpenStack enforces referential integrity for most resources (e.g., Networks cannot be deleted while Subnets exist). If resources are deleted through means that bypass these checks (direct database manipulation, OpenStack bugs), drift detection preserves ORC's existing reconciliation behavior:

Conversation

eshulman2 commented Feb 3, 2026

Uh oh!

mandre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mandre Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdbooth left a comment

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mdbooth Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants