feat: adding operator-controlled per-resource-type config of committed resources#792
Conversation
📝 WalkthroughWalkthroughThis PR refactors the commitment eligibility determination system from metadata-driven checks to configuration-based flags. Flavor group commitment handling is now driven by ChangesCommitment Eligibility Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
Test Coverage ReportTest Coverage 📊: 69.1% |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
internal/scheduling/reservations/commitments/integration_test.go (3)
824-835: ⚡ Quick winAvoid hardcoded child
CommitmentUUIDin deletion test setup.Using a fixed UUID here makes the test fragile against future helper/default changes. Bind the child reservation to
cr.Spec.CommitmentUUIDdirectly.Proposed change
child := &v1alpha1.Reservation{ ObjectMeta: metav1.ObjectMeta{ Name: "my-cr-0", Labels: map[string]string{ v1alpha1.LabelReservationType: v1alpha1.ReservationTypeLabelCommittedResource, }, }, Spec: v1alpha1.ReservationSpec{ Type: v1alpha1.ReservationTypeCommittedResource, CommittedResourceReservation: &v1alpha1.CommittedResourceReservationSpec{ - CommitmentUUID: "test-uuid-1234", + CommitmentUUID: cr.Spec.CommitmentUUID, }, }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/integration_test.go` around lines 824 - 835, The child reservation in the test is using a hardcoded CommitmentUUID ("test-uuid-1234"); update the child reservation creation (the v1alpha1.Reservation literal with CommittedResourceReservationSpec) to set CommitmentUUID to the parent commitment's value (use cr.Spec.CommitmentUUID) instead of the fixed string so the child binds to the actual commitment under test; locate the child object construction in integration_test.go and replace the literal with a reference to cr.Spec.CommitmentUUID.
942-956: ⚡ Quick winDon’t swallow reconciliation/list errors in retry loops.
Ignoring errors in these loops can hide the real failure mode and make these tests noisy/flaky. Prefer explicit error checks (or helper methods that
t.Fatalf).Proposed change pattern
- for range 3 { - env.crController.Reconcile(ctx, crReq) //nolint:errcheck + for range 3 { + if _, err := env.crController.Reconcile(ctx, crReq); err != nil { + t.Fatalf("CR reconcile: %v", err) + } var resList v1alpha1.ReservationList - env.k8sClient.List(ctx, &resList, client.MatchingLabels{ //nolint:errcheck + if err := env.k8sClient.List(ctx, &resList, client.MatchingLabels{ v1alpha1.LabelReservationType: v1alpha1.ReservationTypeLabelCommittedResource, - }) + }); err != nil { + t.Fatalf("list reservations: %v", err) + } for _, res := range resList.Items { resReq := ctrl.Request{NamespacedName: types.NamespacedName{Name: res.Name}} - env.resController.Reconcile(ctx, resReq) //nolint:errcheck - env.resController.Reconcile(ctx, resReq) //nolint:errcheck + if _, err := env.resController.Reconcile(ctx, resReq); err != nil { + t.Fatalf("reservation reconcile %s: %v", res.Name, err) + } + if _, err := env.resController.Reconcile(ctx, resReq); err != nil { + t.Fatalf("reservation reconcile %s: %v", res.Name, err) + } } - env.crController.Reconcile(ctx, crReq) //nolint:errcheck + if _, err := env.crController.Reconcile(ctx, crReq); err != nil { + t.Fatalf("CR reconcile: %v", err) + } }Also applies to: 1120-1132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/integration_test.go` around lines 942 - 956, The retry loop swallows errors from env.crController.Reconcile, env.k8sClient.List and env.resController.Reconcile making failures noisy; update the loop to capture returned errors from these calls (for the calls using ctx, crReq and resReq) and fail the test immediately (e.g. t.Fatalf or a helper assert) when an error is non-nil, and likewise check the error from env.k8sClient.List before iterating resList.Items; apply the same explicit error-checking/failure pattern to the similar block at lines ~1120-1132.
410-426: ⚡ Quick winFilter child reservations by commitment UUID instead of name prefix.
Prefix matching is brittle and can misclassify children if naming changes. Listing by
idxReservationByCommitmentUUIDaligns this helper with controller ownership logic.Proposed change
func (e *intgEnv) listChildReservations(t *testing.T, crName string) []v1alpha1.Reservation { t.Helper() + cr := e.getCR(t, crName) var list v1alpha1.ReservationList if err := e.k8sClient.List(context.Background(), &list, client.MatchingLabels{ v1alpha1.LabelReservationType: v1alpha1.ReservationTypeLabelCommittedResource, + }, client.MatchingFields{ + idxReservationByCommitmentUUID: cr.Spec.CommitmentUUID, }); err != nil { t.Fatalf("list reservations: %v", err) } - prefix := crName + "-" - var children []v1alpha1.Reservation - for _, r := range list.Items { - if strings.HasPrefix(r.Name, prefix) { - children = append(children, r) - } - } - return children + return list.Items }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/integration_test.go` around lines 410 - 426, Replace brittle name-prefix filtering in listChildReservations with field-based lookup by commitment UUID: change the function (listChildReservations) to accept the commitment UUID (or derive it before calling) and call e.k8sClient.List with client.MatchingFields{idxReservationByCommitmentUUID: commitmentUUID} (instead of MatchingLabels + prefix scan), then return the returned list.Items directly (or filter only by exact UUID if needed) and remove the strings.HasPrefix logic.internal/scheduling/reservations/commitments/api/info_test.go (1)
141-293: ⚡ Quick winAdd explicit
HasQuotaassertions in this config-driven flags test.This test claims to validate
HasQuota, but currently only checksHasCapacityandHandlesCommitments. AddingHasQuotachecks here would close the regression gap for the new config field.Example assertions to add
if !ramResource.HasCapacity { t.Error("hw_version_hana_fixed_ram: expected HasCapacity=true") } + if ramResource.HasQuota { + t.Error("hw_version_hana_fixed_ram: expected HasQuota=false") + } if !coresResource.HasCapacity { t.Error("hw_version_hana_fixed_cores: expected HasCapacity=true") } + if coresResource.HasQuota { + t.Error("hw_version_hana_fixed_cores: expected HasQuota=false") + } if !v2RamResource.HasCapacity { t.Error("hw_version_v2_variable_ram: expected HasCapacity=true") } + if v2RamResource.HasQuota { + t.Error("hw_version_v2_variable_ram: expected HasQuota=false") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api/info_test.go` around lines 141 - 293, The test TestHandleInfo_ResourceFlagsFromConfig is missing assertions for HasQuota; add explicit checks that each resource in serviceInfo.Resources (hw_version_hana_fixed_ram, hw_version_hana_fixed_cores, hw_version_hana_fixed_instances, hw_version_v2_variable_ram, hw_version_v2_variable_cores, hw_version_v2_variable_instances) has HasQuota == false (since the provided FlavorGroupResourceConfig only sets HandlesCommitments/HasCapacity and no HasQuota), and fail the test with t.Error/t.Fatalf messages if any resource is missing or HasQuota is unexpectedly true to ensure the config-driven HasQuota behavior is covered.helm/bundles/cortex-nova/values.yaml (1)
177-192: Consider a rollout guard for the wildcard default.With
"*"configured tohandlesCommitments: falsefor all resource types, commitment acceptance is effectively off everywhere unless overridden per environment. Consider adding a deployment-time guard (or release-note callout) so this isn’t accidentally rolled out without overrides where commitments are expected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/values.yaml` around lines 177 - 192, The wildcard default under flavorGroupResourceConfig ("*") currently sets handlesCommitments: false for ram/cores/instances which disables commitments cluster-wide; add a deployment guard so this default cannot be accidentally rolled out without explicit per-environment overrides: implement a Helm template validation or values-check that inspects .Values.flavorGroupResourceConfig."*" and fails (or emits a clear blocking error) if handlesCommitments is false and no per-environment flavorGroup keys present, or add an explicit requireCommitmentsOverride flag in values (e.g., requireCommitmentsOverride: true) that must be set to allow the wildcard false default; reference flavorGroupResourceConfig, "*", and handlesCommitments when adding the check and include a clear release-note/warning message instructing operators to supply environment-specific overrides.internal/scheduling/reservations/commitments/api/info.go (1)
95-100: ⚡ Quick winUpdate the
buildServiceInfodocblock to reflect config-driven behavior.The comment still describes fixed-ratio-derived commitment handling, but the implementation now sources flags from
ResourceConfigForGroup(...).Proposed comment-only fix
-// For each flavor group, three resources are registered: -// - _ram: RAM resource (unit = multiples of smallest flavor RAM, HandlesCommitments=true only if fixed ratio) -// - _cores: CPU cores resource (unit = 1, HandlesCommitments=false) -// - _instances: Instance count resource (unit = 1, HandlesCommitments=false) -// All flavor groups report usage; only those with fixed RAM/core ratio accept commitments. +// For each flavor group, three resources are registered: +// - _ram: RAM resource (unit = multiples of smallest flavor RAM) +// - _cores: CPU cores resource (unit = 1) +// - _instances: Instance count resource (unit = 1) +// Resource capability flags (HasCapacity, HasQuota, HandlesCommitments) are sourced from +// API config via ResourceConfigForGroup(...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api/info.go` around lines 95 - 100, The buildServiceInfo docblock is outdated: it still states commitment handling is determined by a fixed RAM/core ratio, but the code now consults ResourceConfigForGroup(...) for per-group flags; update the comment in buildServiceInfo to describe that for each flavor group three resources (_ram, _cores, _instances) are registered and that the HandlesCommitments and other behavior are driven by the ResourceConfigForGroup result (e.g., whether the config enables commitments for that group), not by fixed-ratio logic—reference buildServiceInfo and ResourceConfigForGroup in the updated comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api/change_commitments.go`:
- Around line 250-253: The handler mutates cr.Annotations (setting
v1alpha1.AnnotationCreatorRequestID via
reservations.GlobalRequestIDFromContext(ctx)) but snapshots only capture Spec so
rollbacks miss restoring annotations; update the snapshot/restore logic to
include Annotations (deep-copy the map) when creating a snapshot and restore
that map on rollback so the previous annotation state is fully reverted; ensure
any snapshot struct or methods that reference the resource snapshot (e.g., where
Spec is captured) are extended to store and restore Annotations and use safe nil
checks and map copies when applying the restored annotations back to cr.
---
Nitpick comments:
In `@helm/bundles/cortex-nova/values.yaml`:
- Around line 177-192: The wildcard default under flavorGroupResourceConfig
("*") currently sets handlesCommitments: false for ram/cores/instances which
disables commitments cluster-wide; add a deployment guard so this default cannot
be accidentally rolled out without explicit per-environment overrides: implement
a Helm template validation or values-check that inspects
.Values.flavorGroupResourceConfig."*" and fails (or emits a clear blocking
error) if handlesCommitments is false and no per-environment flavorGroup keys
present, or add an explicit requireCommitmentsOverride flag in values (e.g.,
requireCommitmentsOverride: true) that must be set to allow the wildcard false
default; reference flavorGroupResourceConfig, "*", and handlesCommitments when
adding the check and include a clear release-note/warning message instructing
operators to supply environment-specific overrides.
In `@internal/scheduling/reservations/commitments/api/info_test.go`:
- Around line 141-293: The test TestHandleInfo_ResourceFlagsFromConfig is
missing assertions for HasQuota; add explicit checks that each resource in
serviceInfo.Resources (hw_version_hana_fixed_ram, hw_version_hana_fixed_cores,
hw_version_hana_fixed_instances, hw_version_v2_variable_ram,
hw_version_v2_variable_cores, hw_version_v2_variable_instances) has HasQuota ==
false (since the provided FlavorGroupResourceConfig only sets
HandlesCommitments/HasCapacity and no HasQuota), and fail the test with
t.Error/t.Fatalf messages if any resource is missing or HasQuota is unexpectedly
true to ensure the config-driven HasQuota behavior is covered.
In `@internal/scheduling/reservations/commitments/api/info.go`:
- Around line 95-100: The buildServiceInfo docblock is outdated: it still states
commitment handling is determined by a fixed RAM/core ratio, but the code now
consults ResourceConfigForGroup(...) for per-group flags; update the comment in
buildServiceInfo to describe that for each flavor group three resources (_ram,
_cores, _instances) are registered and that the HandlesCommitments and other
behavior are driven by the ResourceConfigForGroup result (e.g., whether the
config enables commitments for that group), not by fixed-ratio logic—reference
buildServiceInfo and ResourceConfigForGroup in the updated comment.
In `@internal/scheduling/reservations/commitments/integration_test.go`:
- Around line 824-835: The child reservation in the test is using a hardcoded
CommitmentUUID ("test-uuid-1234"); update the child reservation creation (the
v1alpha1.Reservation literal with CommittedResourceReservationSpec) to set
CommitmentUUID to the parent commitment's value (use cr.Spec.CommitmentUUID)
instead of the fixed string so the child binds to the actual commitment under
test; locate the child object construction in integration_test.go and replace
the literal with a reference to cr.Spec.CommitmentUUID.
- Around line 942-956: The retry loop swallows errors from
env.crController.Reconcile, env.k8sClient.List and env.resController.Reconcile
making failures noisy; update the loop to capture returned errors from these
calls (for the calls using ctx, crReq and resReq) and fail the test immediately
(e.g. t.Fatalf or a helper assert) when an error is non-nil, and likewise check
the error from env.k8sClient.List before iterating resList.Items; apply the same
explicit error-checking/failure pattern to the similar block at lines
~1120-1132.
- Around line 410-426: Replace brittle name-prefix filtering in
listChildReservations with field-based lookup by commitment UUID: change the
function (listChildReservations) to accept the commitment UUID (or derive it
before calling) and call e.k8sClient.List with
client.MatchingFields{idxReservationByCommitmentUUID: commitmentUUID} (instead
of MatchingLabels + prefix scan), then return the returned list.Items directly
(or filter only by exact UUID if needed) and remove the strings.HasPrefix logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8402c52f-f682-4a04-abf5-6e39a57adecc
📒 Files selected for processing (15)
helm/bundles/cortex-nova/values.yamlinternal/scheduling/reservations/commitments/api/change_commitments.gointernal/scheduling/reservations/commitments/api/change_commitments_e2e_test.gointernal/scheduling/reservations/commitments/api/change_commitments_test.gointernal/scheduling/reservations/commitments/api/info.gointernal/scheduling/reservations/commitments/api/info_test.gointernal/scheduling/reservations/commitments/committed_resource_controller.gointernal/scheduling/reservations/commitments/committed_resource_controller_test.gointernal/scheduling/reservations/commitments/committed_resource_integration_test.gointernal/scheduling/reservations/commitments/config.gointernal/scheduling/reservations/commitments/flavor_group_eligibility.gointernal/scheduling/reservations/commitments/integration_test.gointernal/scheduling/reservations/commitments/reservation_controller.gointernal/scheduling/reservations/commitments/reservation_manager.gointernal/scheduling/reservations/commitments/reservation_manager_test.go
💤 Files with no reviewable changes (3)
- internal/scheduling/reservations/commitments/flavor_group_eligibility.go
- internal/scheduling/reservations/commitments/committed_resource_integration_test.go
- internal/scheduling/reservations/commitments/committed_resource_controller_test.go
| if cr.Annotations == nil { | ||
| cr.Annotations = make(map[string]string) | ||
| } | ||
| cr.Annotations[v1alpha1.AnnotationCreatorRequestID] = reservations.GlobalRequestIDFromContext(ctx) |
There was a problem hiding this comment.
Rollback is incomplete after introducing annotation mutation.
At Line 250-253, the handler mutates cr.Annotations, but snapshots only preserve Spec. If a batch rolls back, the previous annotation state is not restored (and can be lost on delete/recreate), so rollback is no longer all-or-nothing.
Proposed fix (snapshot + restore annotations)
import (
"context"
"encoding/json"
"errors"
"fmt"
+ "maps"
"net/http"
"sort"
"strings"
"time"
@@
type crSnapshot struct {
crName string
prevSpec *v1alpha1.CommittedResourceSpec
+ prevAnnotations map[string]string
wasDeleted bool
}
@@
} else {
specCopy := existing.Spec
snap.prevSpec = &specCopy
+ if existing.Annotations != nil {
+ snap.prevAnnotations = make(map[string]string, len(existing.Annotations))
+ maps.Copy(snap.prevAnnotations, existing.Annotations)
+ }
}
@@
if snap.wasDeleted {
if snap.prevSpec == nil {
return // was absent before deletion attempt; nothing to undo
}
cr := &v1alpha1.CommittedResource{}
cr.Name = snap.crName
cr.Spec = *snap.prevSpec
+ if snap.prevAnnotations != nil {
+ cr.Annotations = make(map[string]string, len(snap.prevAnnotations))
+ maps.Copy(cr.Annotations, snap.prevAnnotations)
+ }
if err := k8sClient.Create(ctx, cr); client.IgnoreAlreadyExists(err) != nil {
logger.Error(err, "failed to re-create CommittedResource CRD during rollback", "name", snap.crName)
}
return
}
@@
cr.Spec = *snap.prevSpec
+ if snap.prevAnnotations != nil {
+ cr.Annotations = make(map[string]string, len(snap.prevAnnotations))
+ maps.Copy(cr.Annotations, snap.prevAnnotations)
+ } else {
+ cr.Annotations = nil
+ }
if err := k8sClient.Update(ctx, cr); err != nil {
logger.Error(err, "failed to restore CommittedResource CRD spec during rollback", "name", snap.crName)
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api/change_commitments.go`
around lines 250 - 253, The handler mutates cr.Annotations (setting
v1alpha1.AnnotationCreatorRequestID via
reservations.GlobalRequestIDFromContext(ctx)) but snapshots only capture Spec so
rollbacks miss restoring annotations; update the snapshot/restore logic to
include Annotations (deep-copy the map) when creating a snapshot and restore
that map on rollback so the previous annotation state is fully reverted; ensure
any snapshot struct or methods that reference the resource snapshot (e.g., where
Spec is captured) are extended to store and restore Annotations and use safe nil
checks and map copies when applying the restored annotations back to cr.
) ## Summary - Bumps the cortex library chart appVersion to `sha-ab6eb45d` to track the latest commit on main (feat: adding operator-controlled per-resource-type config of committed resources #792) This ensures the release PR #793 includes the correct appVersion for all changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: cortex-ai-agents[bot] <cortex-ai-agents[bot]@users.noreply.github.com> Co-authored-by: Philipp Matthes <27271818+PhilippMatthes@users.noreply.github.com>
Replaces hard-coded flavor group eligibility logic with operator-controlled per-resource-type config, and enables variable CPU:RAM ratio to accept committed resources