Skip to content

Conversation

@ductrung-nguyen
Copy link

@ductrung-nguyen ductrung-nguyen commented Dec 15, 2025

There are several different changes in this PR, in some dedicated commits.
As they have some overlapped parts, it is difficult to split them into different PRs.

Notes: Why do we mainly use annotations to control some behavior of the operator instead of updating the CR fileds? Quick answer, to maintain the patches of the CRDs are nightmare for us. Moreover, the CRD deployment is out of our control, and we often need to wait for them to be deployed. This process takes time and slow down our development process.


1.0. Prefix-based metadata filtering

Description

This PR implements prefix-based metadata filtering to give users control over how labels and annotations propagate from the Custom Resource (CR) to the underlying StatefulSet and Pods.

Previously, almost all metadata was copied everywhere. Now you can use specific prefixes to target where you want your labels to go:

  • sts-only.*: Keys starting with this are added only to the StatefulSet metadata (and the prefix is stripped!). They won't appear on the Pod Template, so adding them won't trigger a pod rollout.
  • pod-only.*: Keys are added only to the Pod Template (and excluded from the StatefulSet metadata).
  • System prefixes: We now automatically exclude kubectl.kubernetes.io/* and operator.splunk.com/* prefixes to prevent clutter and conflicts.

Regular labels without these prefixes work exactly as before—propagatating to both.

Key Changes

  • pkg/splunk/common/util.go: Added the core filtering logic. It checks for sts-only. and pod-only. prefixes.
  • pkg/splunk/common/statefulset_util.go: Implemented SyncParentMetaToStatefulSet which handles the transformation (stripping sts-only.) and syncing logic.
  • pkg/splunk/enterprise/configuration.go: Updated the main reconciliation loop to use these new syncing functions instead of the old blind copy.
  • pkg/splunk/splkcontroller/statefulset.go: Added logic to detect if only StatefulSet metadata changed (so we can update the STS object without touching the Pod specs).

Testing and Verification

Many unit tests are implemented.

For E2E validation:

  1. Update the existing Splunk CR.
  2. Add a label sts-only.my-tag: true.
    • Verifed: The StatefulSet has my-tag: true.
    • Verifed: The Pods do not have my-tag (and thus no rollout if added later).
  3. Add a label pod-only.scrapper: true.
    • Verified: The Pods have pod-only.scrapper: true.
    • Verified: The StatefulSet does not have this label.
  4. Add a generic label environment: prod.
    • Verified: Both STS and Pods have environment: prod.

Related Issues

#1652

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

2.0. Flexible scaling

Description

Makes StatefulSet scaling more flexible by fixing two annoying behaviors:

  1. Scale-down no longer blocks on unhealthy pods - Previously the operator waited for all pods to be ready before scaling down, which doesn't make sense since removing pods doesn't need healthy pods. Now it just checks if we have more replicas than desired and proceeds.

  2. Scale-up has configurable timeout - Instead of waiting forever for pods to be ready, users can now set a timeout via annotation. After the timeout, scale-up proceeds anyway (with a warning logged). Default is still "wait forever" to maintain backward compat.

Also includes some defensive improvements to scale-down:

  • Better bounds checking in PrepareScaleDown() to handle edge cases like manually deleted pods
  • Fallback mechanism to query Cluster Manager directly when CR status is stale
  • Empty peer status now treated as "decommission complete" instead of blocking

Key Changes

New annotations:

  • operator.splunk.com/scale-up-ready-wait-timeout: time of waiting for all pods to be ready before scaling up
  • operator.splunk.com/scale-up-wait-started: internal tracking the starting time of scale up wait

Behavior at a glance of the new annotation operator.splunk.com/scale-up-ready-wait-timeout:

Annotation value Result
(missing) proceed immediately
"invalid" proceed immediately
"0" or "0s" proceed immediately
"5m" wait up to 5 minutes
"-1" or any negative wait forever

Testing and Verification

Unit tests added:
Please check the code for more information.

Manual E2E testing:
Was done by scaling up and down cluster often and quickly.

Related Issues

#1646

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

3.0. Support updating CPU or changing VCT with CPU-awareness option

Description

This PR introduces 3 major enhancements for better scaling and updates when changing the CPU or Volume Claim Template:

  1. CPU-aware scaling: Keeps your total cluster CPU capacity constant when you change Pod sizes. It automatically adjusts replica counts up or down to match the new Pod specs.
  2. Parallel pod updates: Speeds up rolling restarts by updating multiple pods at once (e.g., 25%).
  3. Automatic VolumeClaimTemplate Updates: Safely handles changes to VolumeClaimTemplates (which are normally immutable in K8s).
    • Storage Expansion: Automatically expands PVCs.
    • Immutable Changes: Safely recreates the StatefulSet (orphaning pods) or manages migration for changes like StorageClass or access modes.

We also handles the safe ordering of scaling operations (scale-up first, then scale-down, or vice versa) to ensure capacity is never lost.

This change is fully backward compatible.

Key Changes

New Annotations:

  • operator.splunk.com/preserve-total-cpu - enables CPU-preserving replica adjustment
  • operator.splunk.com/parallel-pod-updates - controls concurrent pod updates (< 1.0 = percentage, >= 1.0 = absolute count)

Behavior:

  • Scale-up: happens immediately when CPU decreases per pod (more pods needed)
  • Scale-down: happens gradually with safety checks (waits for new-spec pods to be ready, respects CPU bounds)
  • Parallel updates: defaults to 1 pod at a time, configurable via annotation

Testing and Verification

  • Added unit tests for all new helper functions
  • Added integration-style tests for CPU-aware scaling scenarios
  • Added tests for percentage and absolute parallel update modes
  • Existing tests updated to reflect new scale-down prioritization behavior
    ...

Related Issues

#1645
#1647

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

4.0 Supporting improving startup performance on large persistent volumes by configuring fsGroupChangePolicy

Description

This PR adds support for configuring fsGroupChangePolicy on Splunk pod security contexts to improve startup performance on large persistent volumes.

When pods with large PVs start up, Kubernetes can spend a long time recursively changing ownership/permissions on all files. By setting the policy to OnRootMismatch, we skip this expensive operation if the root dir already has the correct fsGroup - which is the case for most restarts.

Key Changes

  • api/v4/common_types.go: Added FSGroupChangePolicy field to the common Spec struct. Users can now set this in their CR spec with validation for Always or OnRootMismatch values.

  • pkg/splunk/enterprise/configuration.go: Added getFSGroupChangePolicy() helper that implements the precedence logic (annotation > spec > default). The default is now OnRootMismatch for better out-of-the-box perf.

  • pkg/splunk/splkcontroller/statefulset.go: Added the FSGroupChangePolicyAnnotation constant for the operator.splunk.com/fs-group-change-policy annotation.

  • docs/CustomResources.md: Added docs explaining the feature, configuration options (spec field vs annotation), and the precedence rules.

  • CRD yaml files: Updated all CRD definitions to include the new field (generated from make generate and command make manifest

Testing and Verification

  • Added unit tests for getFSGroupChangePolicy() covering all precedence scenarios
  • Tested with both spec field and annotation configurations
  • Verified invalid annotation values fall back gracefully with warning log
  • Manual testing on cluster with large PVs showed significant startup time improvement

Related Issues

#1648

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

Copilot AI review requested due to automatic review settings December 15, 2025 13:32
@ductrung-nguyen ductrung-nguyen changed the title Amadeus patches Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates Dec 15, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces significant enhancements to StatefulSet scaling behavior in the Splunk Operator, focusing on CPU-aware scaling, parallel pod updates, and configurable scale-up timeout mechanisms. However, the PR description is incomplete ("To be updated...."), making it difficult to understand the full context and motivation.

Key Changes

  • CPU-preserving scaling that maintains total CPU allocation when pod CPU requests change
  • Parallel pod update support to speed up large cluster updates (configurable via annotation)
  • Configurable scale-up timeout to handle scenarios where pods don't become ready

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/splunk/test/controller.go Added PodList support to mock client for testing
pkg/splunk/splkcontroller/statefulset.go Core implementation of CPU-aware scaling, parallel updates, and timeout mechanisms
pkg/splunk/splkcontroller/statefulset_test.go Comprehensive tests for scale-up timeout and CPU-aware scaling
pkg/splunk/splkcontroller/statefulset_parallel_test.go New test suite for parallel pod update functionality
pkg/splunk/splkcontroller/statefulset_cpu_test.go New test suite for CPU-aware scaling features
pkg/splunk/splkcontroller/statefulset_cpu_scaledown_test.go Tests for CPU-aware scale-down operations
pkg/splunk/enterprise/indexercluster.go Enhanced PrepareScaleDown with fallback cleanup mechanism
pkg/splunk/enterprise/indexercluster_test.go Updated tests for new scale-down behavior and zombie peer prevention
pkg/splunk/enterprise/searchheadcluster_test.go Test updates to match new scale-down call patterns
docs/CustomResources.md Documentation for new scaling behavior annotations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1145 to 1377
minTimeout = 30 * time.Second
maxTimeout = 24 * time.Hour
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded minimum timeout of 30 seconds and maximum of 24 hours may not be appropriate for all cluster sizes and operational requirements. The minimum value of 30 seconds could be too long for small test clusters, while 24 hours may be too long for production environments where faster feedback is desired. Consider making these values configurable or documenting the rationale for these specific limits.

Copilot uses AI. Check for mistakes.
@ductrung-nguyen ductrung-nguyen marked this pull request as draft December 19, 2025 22:12
@vivekr-splunk vivekr-splunk self-requested a review January 22, 2026 15:43
@ductrung-nguyen ductrung-nguyen marked this pull request as ready for review January 22, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant