feat: add multi-cluster installation configs for observability-metrics-prometheus Helm chart by akila-i · Pull Request #73 · openchoreo/community-modules

akila-i · 2026-04-08T09:38:16Z

Purpose

This pull request introduces significant enhancements to the Observability Metrics Prometheus module, focusing on supporting multi-cluster topologies and improving deployment flexibility. The changes add explicit support for three installation modes—singleCluster, multiClusterReceiver, and multiClusterExporter—along with corresponding Helm chart logic, documentation, and validation to guide users through setup and troubleshooting.

Key changes:

Multi-cluster topology support:

Added support for three installation modes (singleCluster, multiClusterReceiver, multiClusterExporter) with corresponding configuration options in values.yaml and detailed documentation in README.md. [1] [2] [3]
Introduced a new PrometheusAgent deployment, RBAC, and configuration for the multiClusterExporter mode, enabling remote write to a central receiver. [1] [2]
Added a HTTPRoute manifest for remote write traffic in multiClusterReceiver mode, allowing exporter clusters to send metrics to the central receiver.

Helm chart validation and logic:

Implemented a validation template to enforce correct installation mode selection and required values, preventing misconfiguration at install/upgrade time. [1] [2]
Updated adapter deployment and service templates to only deploy in appropriate modes, ensuring resources are not unnecessarily created in exporter mode. [1] [2] [3] [4]

Documentation and versioning:

Expanded the README.md with detailed installation, verification, and troubleshooting instructions for all supported topologies.
Bumped chart and app versions to 0.2.6 to reflect the new features.

Approach

Summarize the solution and implementation details.

Related Issues

Include any related issues that are resolved by this PR.

Checklist

Tests added or updated (unit, integration, etc.)
Samples updated (if applicable)

Remarks

Add any additional context, known issues, or TODOs related to this PR.

Summary by CodeRabbit

New Features
- Support for three installation modes: single-cluster, multi-cluster receiver, and multi-cluster exporter.
- New config options for remote-write endpoint and gateway hostnames for multi-cluster setups.
Documentation
- Expanded installation guide with separate single-cluster vs multi-cluster steps, exporter/receiver procedures, verification queries, and troubleshooting tips.
Chores
- Chart version bumped to 0.3.0.

coderabbitai · 2026-04-08T09:38:31Z

📝 Walkthrough

Walkthrough

Adds three installation modes (singleCluster, multiClusterExporter, multiClusterReceiver) via new Helm values and conditional templates, introduces RBAC and PrometheusAgent for exporters, an HTTPRoute for receivers, validation helper, README updates, and bumps Helm chart version to 0.3.0.

Changes

Cohort / File(s)	Summary
Documentation & chart version `observability-metrics-prometheus/README.md`, `observability-metrics-prometheus/helm/Chart.yaml`	README expanded with installation modes, multi-cluster steps, verification, and troubleshooting. Helm chart version and appVersion bumped to `0.3.0`.
Configuration & validation `observability-metrics-prometheus/helm/values.yaml`, `observability-metrics-prometheus/helm/templates/_helpers.tpl`, `observability-metrics-prometheus/helm/templates/validate.yaml`	Adds `global.installationMode` and `prometheusCustomizations` values. New Helm helper `observability-metrics-prometheus.validate` enforces allowed modes and required fields for multiClusterExporter; `validate.yaml` invokes it.
Adapter rendering changes `observability-metrics-prometheus/helm/templates/adapter/deployment.yaml`, `observability-metrics-prometheus/helm/templates/adapter/service.yaml`	Adapter Deployment and Service are now conditionally omitted when `global.installationMode == "multiClusterExporter"`.
Multi-cluster exporter resources `observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml`, `observability-metrics-prometheus/helm/templates/prometheus-agent.yaml`	New RBAC (ServiceAccount, ClusterRole, ClusterRoleBinding) and a `PrometheusAgent` resource rendered when `global.installationMode == "multiClusterExporter"`, configured to remote-write to `prometheusCustomizations.http.observabilityPlaneUrl`.
Multi-cluster receiver resources `observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml`	New `HTTPRoute` resource rendered when `global.installationMode == "multiClusterReceiver"` to expose `/api/v1/write` and forward to `openchoreo-observability-prometheus:9091`.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as PrometheusAgent\n(MultiCluster Exporter)
    participant K8s as Kubernetes\n(ServiceDiscovery)
    participant Gateway as Gateway\n(HTTPRoute / observability receiver)
    participant Receiver as Prometheus\n(MultiCluster Receiver)

    Agent->>K8s: discover targets (pods, nodes, services)
    K8s-->>Agent: metrics endpoints
    Agent->>Gateway: HTTP remote_write -> observabilityPlaneUrl (/api/v1/write)
    Gateway->>Receiver: forward remote_write traffic to Prometheus receiver
    Receiver->>Receiver: ingest and store time series

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble logs and metrics bright,

agents scurry through the night,
receivers listen, routes align,
three modes hop—single, export, combine,
small paws clap for observability's light.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: adding multi-cluster installation configurations to the Helm chart, which is the primary objective reflected throughout the changeset.
Description check	✅ Passed	The PR description provides a comprehensive Purpose section detailing key changes and multi-cluster topology support, but the Approach, Related Issues, and Remarks sections are incomplete with only placeholder text.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-04-08T09:39:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml (1)

4-28: Consider validating hostnames for multiClusterReceiver mode.

The HTTPRoute is rendered without hostnames if .Values.prometheusCustomizations.http.hostnames is empty. While this may work for some gateway configurations, exporter clusters typically need a DNS hostname to reach the receiver. Consider adding validation in _helpers.tpl to require hostnames for multiClusterReceiver mode, similar to how observabilityPlaneUrl is required for multiClusterExporter.

Additionally, the parent gateway name gateway-default is hardcoded. If users have a different gateway name, this would need modification.
  parentRefs:
-    - name: gateway-default
+    - name: {{ .Values.prometheusCustomizations.gateway.name | default "gateway-default" }}
      namespace: {{ .Release.Namespace }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml`
around lines 4 - 28, The HTTPRoute template for multiClusterReceiver renders
without validating hostnames and hardcodes the parent gateway name, so add a
validation helper in _helpers.tpl that enforces
.Values.prometheusCustomizations.http.hostnames is non-empty when
.Values.global.installationMode == "multiClusterReceiver" (like the
observabilityPlaneUrl check for multiClusterExporter) and update the template to
reference a configurable gateway name value (e.g., .Values.global.gatewayName)
instead of the literal "gateway-default"; ensure error messages and
documentation mention the required hostnames and configurable gateway value.

observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml (1)

8-9: Avoid fixed RBAC names to prevent cross-release collisions.

Using a fixed cluster-scoped ClusterRole/ClusterRoleBinding name can conflict when the chart is installed more than once in a cluster.

Proposed patch

 metadata:
-  name: openchoreo-observability-agent
+  name: {{ printf "%s-observability-agent" .Release.Name | trunc 63 | trimSuffix "-" }}
@@
 metadata:
-  name: openchoreo-observability-agent
+  name: {{ printf "%s-observability-agent" .Release.Name | trunc 63 | trimSuffix "-" }}
@@
 metadata:
-  name: openchoreo-observability-agent
+  name: {{ printf "%s-observability-agent" .Release.Name | trunc 63 | trimSuffix "-" }}
 roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
-  name: openchoreo-observability-agent
+  name: {{ printf "%s-observability-agent" .Release.Name | trunc 63 | trimSuffix "-" }}
 subjects:
   - kind: ServiceAccount
-    name: openchoreo-observability-agent
+    name: {{ printf "%s-observability-agent" .Release.Name | trunc 63 | trimSuffix "-" }}

Also applies to: 14-15, 38-46

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml`
around lines 8 - 9, The RBAC resources use a fixed name
("openchoreo-observability-agent") which can collide across releases; update the
metadata.name fields for the ServiceAccount/ClusterRole/ClusterRoleBinding to be
release-scoped (e.g., incorporate {{ .Release.Name }} or the chart fullname
template) instead of the constant string so each release gets unique names;
apply the same change to the ClusterRole/ClusterRoleBinding declarations
referenced at the other locations (lines around the
ClusterRole/ClusterRoleBinding blocks) so all RBAC objects are namespaced or
suffixed/prefixed by the release identifier.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-metrics-prometheus/helm/templates/_helpers.tpl`:
- Around line 13-20: The template accesses nested values directly
(prometheusCustomizations.http.observabilityPlaneUrl and
$kps.prometheus.enabled) which can cause nil dereferences; update the
_helpers.tpl check to perform nil-safe reads by verifying parent keys exist (use
hasKey/.Values or default/lookup patterns) before accessing .http or
.prometheus, and short-circuit the validation if those parents are missing so
you still emit the intended fail messages safely; specifically guard access to
.Values.prometheusCustomizations, .Values.prometheusCustomizations.http, and the
$kps.prometheus object (from index .Values "kube-prometheus-stack") before
checking observabilityPlaneUrl or enabled.

In `@observability-metrics-prometheus/README.md`:
- Around line 115-116: Replace the misleading metric suggestion
`prometheus_tsdb_symbol_table_size_bytes` with more appropriate receiver/import
error metrics: recommend using `prometheus_remote_storage_samples_failed_total`
to detect remote-write/import failures and/or
`prometheus_tsdb_out_of_order_samples_total` for out-of-order sample problems;
update the README line that currently lists
`rate(prometheus_tsdb_symbol_table_size_bytes[5m])` to suggest
`rate(prometheus_remote_storage_samples_failed_total[5m])` and optionally
`rate(prometheus_tsdb_out_of_order_samples_total[5m])` so the guidance points to
real import-error signals.

---

Nitpick comments:
In `@observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml`:
- Around line 8-9: The RBAC resources use a fixed name
("openchoreo-observability-agent") which can collide across releases; update the
metadata.name fields for the ServiceAccount/ClusterRole/ClusterRoleBinding to be
release-scoped (e.g., incorporate {{ .Release.Name }} or the chart fullname
template) instead of the constant string so each release gets unique names;
apply the same change to the ClusterRole/ClusterRoleBinding declarations
referenced at the other locations (lines around the
ClusterRole/ClusterRoleBinding blocks) so all RBAC objects are namespaced or
suffixed/prefixed by the release identifier.

In
`@observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml`:
- Around line 4-28: The HTTPRoute template for multiClusterReceiver renders
without validating hostnames and hardcodes the parent gateway name, so add a
validation helper in _helpers.tpl that enforces
.Values.prometheusCustomizations.http.hostnames is non-empty when
.Values.global.installationMode == "multiClusterReceiver" (like the
observabilityPlaneUrl check for multiClusterExporter) and update the template to
reference a configurable gateway name value (e.g., .Values.global.gatewayName)
instead of the literal "gateway-default"; ensure error messages and
documentation mention the required hostnames and configurable gateway value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 730f074b-206f-47d4-9b5d-e025f37c267d

📥 Commits

Reviewing files that changed from the base of the PR and between bd7385c and 6c41b36.

📒 Files selected for processing (10)

observability-metrics-prometheus/README.md
observability-metrics-prometheus/helm/Chart.yaml
observability-metrics-prometheus/helm/templates/_helpers.tpl
observability-metrics-prometheus/helm/templates/adapter/deployment.yaml
observability-metrics-prometheus/helm/templates/adapter/service.yaml
observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml
observability-metrics-prometheus/helm/templates/prometheus-agent.yaml
observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml
observability-metrics-prometheus/helm/templates/validate.yaml
observability-metrics-prometheus/helm/values.yaml

coderabbitai · 2026-04-08T09:43:32Z

+3. Check that queue capacity and batch settings are appropriate for your metrics volume
+4. Monitor central Prometheus for import errors: `rate(prometheus_tsdb_symbol_table_size_bytes[5m])`


⚠️ Potential issue | 🟡 Minor

Consider using a more appropriate metric for detecting import errors.

prometheus_tsdb_symbol_table_size_bytes indicates symbol table size, not import errors. For troubleshooting remote write issues at the receiver, consider suggesting prometheus_remote_storage_samples_failed_total or checking for prometheus_tsdb_out_of_order_samples_total instead.

📝 Suggested fix

-4. Monitor central Prometheus for import errors: `rate(prometheus_tsdb_symbol_table_size_bytes[5m])` +4. Monitor central Prometheus for import errors: `rate(prometheus_remote_storage_samples_failed_total[5m])` or `rate(prometheus_tsdb_out_of_order_samples_total[5m])`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@observability-metrics-prometheus/README.md` around lines 115 - 116, Replace the misleading metric suggestion `prometheus_tsdb_symbol_table_size_bytes` with more appropriate receiver/import error metrics: recommend using `prometheus_remote_storage_samples_failed_total` to detect remote-write/import failures and/or `prometheus_tsdb_out_of_order_samples_total` for out-of-order sample problems; update the README line that currently lists `rate(prometheus_tsdb_symbol_table_size_bytes[5m])` to suggest `rate(prometheus_remote_storage_samples_failed_total[5m])` and optionally `rate(prometheus_tsdb_out_of_order_samples_total[5m])` so the guidance points to real import-error signals.

…s-prometheus Helm chart Signed-off-by: Akila-I <akila.99g@gmail.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml (1)

12-13: Make parent gateway name configurable.

gateway-default is hardcoded at Line 12, which can break receiver mode in environments using a different Gateway name.

♻️ Proposed change

   parentRefs:
-    - name: gateway-default
+    - name: {{ default "gateway-default" .Values.prometheusCustomizations.http.gatewayName }}
       namespace: {{ .Release.Namespace }}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml`
around lines 12 - 13, The parent gateway name is hardcoded as "gateway-default"
in prometheus-remote-write-route.yaml; make it configurable by replacing the
literal with a Helm value (e.g., .Values.gateway.name or .Values.gatewayName)
and provide a sensible default (still "gateway-default") in values.yaml; update
the template to use that value where the gateway name is set so receiver mode
works with custom Gateway names (refer to the gateway name entry in the
prometheus-remote-write-route.yaml template).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml`:
- Around line 14-43: ClusterRole and ClusterRoleBinding use the static name
"openchoreo-prometheus-agent" which will conflict across releases; change the
metadata.name for the ClusterRole and ClusterRoleBinding and the roleRef.name
that points to it to a release-scoped name (e.g., derive from the Helm release
name) in prometheus-agent-rbac.yaml, and update prometheus-agent.yaml to
reference that same release-scoped name so both templates stay consistent;
ensure all occurrences of "openchoreo-prometheus-agent" (metadata.name,
roleRef.name, any subject names) are replaced with the templated release-scoped
identifier used across both files.

---

Nitpick comments:
In
`@observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml`:
- Around line 12-13: The parent gateway name is hardcoded as "gateway-default"
in prometheus-remote-write-route.yaml; make it configurable by replacing the
literal with a Helm value (e.g., .Values.gateway.name or .Values.gatewayName)
and provide a sensible default (still "gateway-default") in values.yaml; update
the template to use that value where the gateway name is set so receiver mode
works with custom Gateway names (refer to the gateway name entry in the
prometheus-remote-write-route.yaml template).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfcf1e9b-6888-419d-9905-e5ce627a753c

📥 Commits

Reviewing files that changed from the base of the PR and between 6c41b36 and fa43d27.

📒 Files selected for processing (10)

observability-metrics-prometheus/README.md
observability-metrics-prometheus/helm/Chart.yaml
observability-metrics-prometheus/helm/templates/_helpers.tpl
observability-metrics-prometheus/helm/templates/adapter/deployment.yaml
observability-metrics-prometheus/helm/templates/adapter/service.yaml
observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml
observability-metrics-prometheus/helm/templates/prometheus-agent.yaml
observability-metrics-prometheus/helm/templates/prometheus-remote-write-route.yaml
observability-metrics-prometheus/helm/templates/validate.yaml
observability-metrics-prometheus/helm/values.yaml

✅ Files skipped from review due to trivial changes (2)

observability-metrics-prometheus/helm/Chart.yaml
observability-metrics-prometheus/helm/values.yaml

🚧 Files skipped from review as they are similar to previous changes (2)

observability-metrics-prometheus/helm/templates/_helpers.tpl
observability-metrics-prometheus/README.md

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

nilushancosta reviewed Apr 9, 2026

View reviewed changes

Comment thread observability-metrics-prometheus/helm/templates/prometheus-agent.yaml Outdated

feat: add multi-cluster installation configs for observability-metric…

fa43d27

…s-prometheus Helm chart Signed-off-by: Akila-I <akila.99g@gmail.com>

akila-i force-pushed the multi-cluster-prom branch from 6c41b36 to fa43d27 Compare April 16, 2026 02:58

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread observability-metrics-prometheus/helm/templates/prometheus-agent-rbac.yaml

nilushancosta approved these changes Apr 16, 2026

View reviewed changes

akila-i merged commit b5cf76e into openchoreo:main Apr 16, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-cluster installation configs for observability-metrics-prometheus Helm chart#73

feat: add multi-cluster installation configs for observability-metrics-prometheus Helm chart#73
akila-i merged 1 commit intoopenchoreo:mainfrom
akila-i:multi-cluster-prom

akila-i commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

codecov-commenter commented Apr 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		3. Check that queue capacity and batch settings are appropriate for your metrics volume
		4. Monitor central Prometheus for import errors: `rate(prometheus_tsdb_symbol_table_size_bytes[5m])`

Conversation

akila-i commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Approach

Related Issues

Checklist

Remarks

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

codecov-commenter commented Apr 8, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akila-i commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading