feat: add kvm project usage metrics#803
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds KVM project utilization KPI implementation with data models for instance counts and capacity usage, Prometheus metrics descriptors, and Collect methods that query OpenStack-backed SQL to emit per-project instance and capacity metrics. Includes shared KVM host types and label logic, comprehensive test coverage, and simplifies VMware metric emission. ChangesKVM Project Utilization KPI
Sequence DiagramsequenceDiagram
participant Prometheus as Prometheus<br/>(scraper)
participant Collect as KVMProjectUtilizationKPI<br/>.Collect()
participant KubeClient as Kubernetes<br/>Client
participant SQLDb as OpenStack<br/>SQL DB
participant MetricCh as Metric<br/>Channel
Prometheus->>Collect: Request metrics
Collect->>KubeClient: getKVMHosts()
KubeClient-->>Collect: HypervisorList (hosts)
Collect->>SQLDb: queryProjectInstanceCount()
SQLDb-->>Collect: Instance counts by<br/>project/host/flavor/az
Collect->>MetricCh: Emit cortex_kvm_project_instances<br/>per host/project/flavor
Collect->>SQLDb: queryProjectCapacityUsage()
SQLDb-->>Collect: Capacity (vCPU, RAM, Disk)<br/>by project/host/az
Collect->>MetricCh: Emit cortex_kvm_project_capacity_usage<br/>(vcpu, memory, disk per host/project)
MetricCh-->>Prometheus: Metrics collected
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/knowledge/kpis/plugins/infrastructure/kvm_project_utilization.go`:
- Around line 196-198: The SQL currently uses "WHERE s.status NOT IN
('DELETED','ERROR')" which still counts non-running states; change the WHERE
clause in the query that builds cortex_kvm_project_instances (the SQL block
around the GROUP BY) to explicitly filter for running instances (e.g. s.status =
'ACTIVE' or s.status IN ('ACTIVE', 'RESCUED') as your definition of running)
instead of NOT IN, and update the metric name/help text for
cortex_kvm_project_instances if you prefer to keep the broader semantics (adjust
the help string to reflect the actual set of statuses you return).
- Around line 100-122: The loop over projectCapacityUsages currently uses
host.getHostLabels() which can have a stale/incorrect availability_zone; change
the label set to prefer the SQL availability zone from the query result (e.g.
projectCapacityUsage.AvailabilityZone or the struct field representing
s.os_ext_az_availability_zone) by replacing or inserting that value into
hostLabels before appending resource kinds and calling
prometheus.MustNewConstMetric; update the loop that builds hostLabels in the
projectCapacityUsages iteration (the block using k.queryProjectCapacityUsage,
host.getHostLabels(), and k.capacityUsagePerProjectAndHost) so the
availability_zone label comes from projectCapacityUsage instead of the host
object.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a6d5c1c-5447-4395-b95e-461aec03048f
📒 Files selected for processing (5)
internal/knowledge/kpis/plugins/infrastructure/kvm_project_utilization.gointernal/knowledge/kpis/plugins/infrastructure/kvm_project_utilization_test.gointernal/knowledge/kpis/plugins/infrastructure/shared.gointernal/knowledge/kpis/plugins/infrastructure/shared_test.gointernal/knowledge/kpis/plugins/infrastructure/vmware_project_utilization.go
Test Coverage ReportTest Coverage 📊: 69.2% |
### Release digest — 2026-05-07 — [#814](#814) #### cortex v0.0.47 (sha-7d1745d8) **New features:** - `ProjectQuota` CRD with per-resource, per-AZ quota breakdown and PAYG calculation ([#796](#796)) - `FlavorGroupCapacity` CRD + background capacity controller for pre-computed per-flavor VM slot capacity per (flavor group × AZ) ([#728](#728)) - Capacity reporting in `POST /commitments/v1/report-capacity` now uses real `FlavorGroupCapacity` CRD values (replaces placeholder zeros) - CommittedResource usage reconciler — moves usage calculation into CRD status, feeding both LIQUID API and quota controller ([#800](#800)) - KVM OS version label on host capacity metrics ([#810](#810)) - KVM project usage metrics (running VMs / resource usage per project/flavor) ([#803](#803)) - `domain_id` + name on vmware project capacity metrics ([#802](#802)) - `domain_id` in vmware project commitment KPI ([#806](#806)) - Weighing explainer for scheduling decisions ([#808](#808)) **Refactors:** - KVM host capacity metric moved to infrastructure plugins package ([#809](#809)) - Deprecated per-compute KPIs removed (`flavor_running_vms`, `host_running_vms`, `resource_capacity_kvm`) ([#807](#807)) - Bundle-specific RBAC templates moved from library chart into `cortex-ironcore` / `cortex-pods` bundles ([#797](#797)) - Webhook templates moved back into `cortex-nova` ([#805](#805)) - `testlib.Ptr` replaced with native `new()` ([#801](#801)) **Fixes:** - Remove `ignoreAllocations` from kvm-report-capacity pipeline to unblock older admission webhook ([#812](#812)) - Suppress nova scheduling alerts on transient `no such host` DNS errors - Add `identity-domains` as KPI dependency - Rename hypervisor `ClusterRoleBinding` to avoid `roleRef` conflict on redeploy ([#804](#804)) #### cortex-nova v0.0.60 (sha-7d1745d8) Includes cortex v0.0.47. Adds Prometheus datasources and KPI CRD templates for KVM project usage/utilization, and updated RBAC for `FlavorGroupCapacity` + `ProjectQuota` CRDs.
Changes