Skip to content

HDDS-14635. kubernetes check fails after restarting datanodes#9772

Open
adoroszlai wants to merge 1 commit intoapache:masterfrom
adoroszlai:HDDS-14635
Open

HDDS-14635. kubernetes check fails after restarting datanodes#9772
adoroszlai wants to merge 1 commit intoapache:masterfrom
adoroszlai:HDDS-14635

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Fix failure in kubernetes check due to read failure after datanodes are restarted:

Ozone Client Key Validator                                            | FAIL |
255 != 0
  • Wait for pipeline to be open before trying to read.
  • Increase client retry count and wait interval.
  • Fix log configuration in k8s config to allow suppressing logger for CLI commands (respect hadoop.root.logger property). This is required for ozone admin --json command output to be valid JSON.

https://issues.apache.org/jira/browse/HDDS-14635

How was this patch tested?

ozone/test.sh passed 10x:
https://github.com/adoroszlai/ozone/actions/runs/22041622888

All kubernetes check passed 10x:
https://github.com/adoroszlai/ozone/actions/runs/22042121647

Regular CI:
https://github.com/adoroszlai/ozone/actions/runs/22042489756

@adoroszlai adoroszlai self-assigned this Feb 15, 2026

assert_pipeline_exists() {
local count
count=$(execute_command_in_container scm-0 ozone admin pipeline list --state OPEN --filter-by-factor THREE --json | jq -r 'length')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be the case that there are only EC pipelines? Any reason why we're waiting only for ratis pipelines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants