feat: CommittedResource CRD controller added#765
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a CommittedResourceController and associated indexing, API fields, CRD schema, tests, docs, and naming changes to manage CommittedResource lifecycle, child Reservation creation/deletion, and status conditions driven by commitment state and placement outcomes. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CR as CommittedResource (API)
participant CRC as CommittedResourceController
participant RM as ReservationManager
participant RES as Reservation CRs
participant Scheduler as Scheduler
User->>CR: Update spec.state / spec.allowRejection
CR->>CRC: Watch/Enqueue -> Reconcile
activate CRC
CRC->>CRC: Ensure finalizer
alt state == planned or pending
CRC->>CR: Patch Ready=False (not active / Rejected)
else state == guaranteed or confirmed
CRC->>RM: ApplyCommitmentState(spec, NamePrefix)
activate RM
RM->>RM: Compute slots using flavor knowledge
RM->>RES: Create/Update child Reservations (commitmentUUID)
RM-->>CRC: applyResult (accepted/removed/touched)
deactivate RM
CRC->>Scheduler: (via Reservation reconciler) request placement
Scheduler-->>RES: Assign TargetHost
CRC->>CR: Patch Status.AcceptedAmount, conditions Ready=True
else state == superseded/expired or deletion
CRC->>RES: Delete child Reservations (by commitmentUUID)
CRC->>CR: Remove finalizer / mark Ready=False
end
deactivate CRC
CR-->>User: Status updated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee88e01c-0f3b-4c88-b341-34dfab148251
📒 Files selected for processing (13)
api/v1alpha1/committed_resource_types.gocmd/manager/main.godocs/reservations/committed-resource-reservations.mdinternal/scheduling/reservations/commitments/committed_resource_controller.gointernal/scheduling/reservations/commitments/committed_resource_controller_test.gointernal/scheduling/reservations/commitments/committed_resource_integration_test.gointernal/scheduling/reservations/commitments/field_index.gointernal/scheduling/reservations/commitments/reservation_controller.gointernal/scheduling/reservations/commitments/reservation_controller_test.gointernal/scheduling/reservations/commitments/reservation_manager.gointernal/scheduling/reservations/commitments/reservation_manager_test.gointernal/scheduling/reservations/commitments/state.gointernal/scheduling/reservations/commitments/syncer_test.go
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/scheduling/reservations/commitments/committed_resource_controller_test.go (1)
281-355: Add a failure case with pre-existing accepted Reservations.These cases only fail from a cold start, so both
AllowRejection=trueandfalselegitimately end up with zero children. The new behavior difference is on a resize/update failure afterStatus.AcceptedAmountand childReservationCRs already exist:trueshould roll back to the accepted set, whilefalseshould keep the partially applied children and requeue. A regression there would still pass this suite today.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/committed_resource_controller_test.go` around lines 281 - 355, The test suite TestCommittedResourceController_PlacementFailure misses cases where Reservations already exist and Status.AcceptedAmount is non-zero; add test cases that prepopulate the fake k8sClient with accepted Reservation CRs and set cr.Status.AcceptedAmount accordingly, then call CommittedResourceController.Reconcile and assert divergent behavior: when cr.Spec.AllowRejection==true the controller should roll back to the accepted set (delete/restore child Reservation CRs and end with Rejected/no requeue), whereas when AllowRejection==false it should keep the partially applied children and requeue (expect RequeueAfter>0); use existing helpers like newTestCommittedResource, newCRTestClient, countChildReservations and assertCondition to set up and verify each scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reservations/committed-resource-reservations.md`:
- Line 120: The docs incorrectly state `Ready=Reserving`; update the text to
reflect the actual condition tuple written by the controller: use `Ready=False,
Reason=Reserving` (i.e. `Type=Ready`, `Status=False`, `Reason=Reserving`). Also
ensure the surrounding sentence that mentions `Spec.AllowRejection=true` and
retry vs reject behavior remains consistent with this corrected condition
representation.
In
`@internal/scheduling/reservations/commitments/committed_resource_controller_test.go`:
- Around line 357-387: The test TestCommittedResourceController_BadSpec uses the
valid enum CommittedResourceTypeCores but asserts permanent rejection; change
the test to use a truly invalid resource type (e.g. set Spec.ResourceType =
v1alpha1.CommittedResourceType("invalid-resource-type")) so the bad-spec path
exercised by CommittedResourceController.Reconcile and the assertions
(assertCondition and countChildReservations) match the public API;
alternatively, if you intend to support "cores", update the controller logic
that validates Spec.ResourceType to accept CommittedResourceTypeCores instead of
rejecting it.
In
`@internal/scheduling/reservations/commitments/committed_resource_controller.go`:
- Around line 77-84: The reconcilePending handler currently logs
rollback/cleanup errors (from deleteChildReservations or rollbackToAccepted) but
still marks the CommittedResource terminally Rejected; change the flow so that
if deleteChildReservations(ctx, cr) or rollbackToAccepted(ctx, cr) returns an
error you return that error (so the controller retries) instead of calling
setNotReady(..., "Rejected", ...), and only call setNotReady(..., "Rejected",
...) once the cleanup/rollback succeeds; apply this change in reconcilePending
around the applyReservationState branch and in the similar branch that uses
rollbackToAccepted so failed rollbacks prevent finalizing the resource until
cleanup succeeds.
---
Nitpick comments:
In
`@internal/scheduling/reservations/commitments/committed_resource_controller_test.go`:
- Around line 281-355: The test suite
TestCommittedResourceController_PlacementFailure misses cases where Reservations
already exist and Status.AcceptedAmount is non-zero; add test cases that
prepopulate the fake k8sClient with accepted Reservation CRs and set
cr.Status.AcceptedAmount accordingly, then call
CommittedResourceController.Reconcile and assert divergent behavior: when
cr.Spec.AllowRejection==true the controller should roll back to the accepted set
(delete/restore child Reservation CRs and end with Rejected/no requeue), whereas
when AllowRejection==false it should keep the partially applied children and
requeue (expect RequeueAfter>0); use existing helpers like
newTestCommittedResource, newCRTestClient, countChildReservations and
assertCondition to set up and verify each scenario.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 98e3a6ea-2517-4505-aa05-7071cb8e4bdd
📒 Files selected for processing (6)
api/v1alpha1/committed_resource_types.godocs/reservations/committed-resource-reservations.mdhelm/library/cortex/files/crds/cortex.cloud_committedresources.yamlinternal/scheduling/reservations/commitments/committed_resource_controller.gointernal/scheduling/reservations/commitments/committed_resource_controller_test.gointernal/scheduling/reservations/commitments/committed_resource_integration_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/scheduling/reservations/commitments/committed_resource_integration_test.go
| func TestCommittedResourceController_BadSpec(t *testing.T) { | ||
| // Invalid resource type → permanent Rejected regardless of AllowRejection (bad spec won't fix itself). | ||
| scheme := newCRTestScheme(t) | ||
| cr := &v1alpha1.CommittedResource{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "test-cr", | ||
| Finalizers: []string{crFinalizer}, | ||
| }, | ||
| Spec: v1alpha1.CommittedResourceSpec{ | ||
| CommitmentUUID: "test-uuid-1234", | ||
| FlavorGroupName: "test-group", | ||
| ResourceType: v1alpha1.CommittedResourceTypeCores, | ||
| Amount: resource.MustParse("4"), | ||
| AvailabilityZone: "test-az", | ||
| ProjectID: "test-project", | ||
| DomainID: "test-domain", | ||
| State: v1alpha1.CommitmentStatusConfirmed, | ||
| }, | ||
| } | ||
| k8sClient := newCRTestClient(scheme, cr, newTestFlavorKnowledge()) | ||
| controller := &CommittedResourceController{Client: k8sClient, Scheme: scheme, Conf: Config{}} | ||
|
|
||
| if _, err := controller.Reconcile(context.Background(), reconcileReq(cr.Name)); err != nil { | ||
| t.Fatalf("reconcile: %v", err) | ||
| } | ||
|
|
||
| assertCondition(t, k8sClient, cr.Name, metav1.ConditionFalse, "Rejected") | ||
| if got := countChildReservations(t, k8sClient, cr.Spec.CommitmentUUID); got != 0 { | ||
| t.Errorf("expected 0 child reservations after bad-spec rejection, got %d", got) | ||
| } | ||
| } |
There was a problem hiding this comment.
This test locks in a behavior that contradicts the public API.
CommittedResourceTypeCores is still a schema-valid resourceType in api/v1alpha1/committed_resource_types.go and the generated CRD/docs, but this test asserts it must be permanently rejected as a bad spec. Either the controller needs the advertised arithmetic-only cores path, or cores needs to be removed from the exposed API until it is implemented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@internal/scheduling/reservations/commitments/committed_resource_controller_test.go`
around lines 357 - 387, The test TestCommittedResourceController_BadSpec uses
the valid enum CommittedResourceTypeCores but asserts permanent rejection;
change the test to use a truly invalid resource type (e.g. set Spec.ResourceType
= v1alpha1.CommittedResourceType("invalid-resource-type")) so the bad-spec path
exercised by CommittedResourceController.Reconcile and the assertions
(assertCondition and countChildReservations) match the public API;
alternatively, if you intend to support "cores", update the controller logic
that validates Spec.ResourceType to accept CommittedResourceTypeCores instead of
rejecting it.
Test Coverage ReportTest Coverage 📊: 69.1% |
Introduces committed_resource_controller.go that watches CommittedResource CRDs and owns all child Reservation CRUD.