Skip to content

TRT-2471: Consider mass failures during CR query#3285

Open
xueqzhan wants to merge 4 commits intoopenshift:mainfrom
xueqzhan:mass-failure
Open

TRT-2471: Consider mass failures during CR query#3285
xueqzhan wants to merge 4 commits intoopenshift:mainfrom
xueqzhan:mass-failure

Conversation

@xueqzhan
Copy link
Contributor

@xueqzhan xueqzhan commented Feb 19, 2026

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

	"install should succeed: overall",
	"[sig-cluster-lifecycle] Cluster completes upgrade",
	"[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

  • New Features

    • Added key-test filtering with priority-based handling for component readiness reports.
    • Added a new "4.22-main-mass-failure" component readiness view/configuration for enhanced reporting.
  • Tests

    • Added unit tests validating exclusive key-test filtering and priority-based behavior.
  • Chores

    • Minor code formatting adjustments.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 19, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from dgoodwin and stbenjam February 19, 2026 18:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 316eaf3 and 698c26f.

📒 Files selected for processing (3)
  • config/views.yaml
  • pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
  • pkg/api/componentreadiness/query/querygenerators.go

Walkthrough

Adds support for priority-based "key test" filtering: callers can supply KeyTestNames which alter SQL generation to restrict results to the highest-priority failing key test per job. Changes propagate the new field through request types, query generation, middleware, tests, and configuration.

Changes

Cohort / File(s) Summary
Query Logic & Tests
pkg/api/componentreadiness/query/querygenerators.go, pkg/api/componentreadiness/query/querygenerators_test.go
BuildComponentReportQuery signature extended to accept keyTestNames ...string. Adds CTEs (key_test_priorities, jobs_with_highest_priority_test) and priority CASE helper; query uses junit_data derived table and injects filters/params for key tests. Unit tests added validating CTEs, parameter propagation, and exclusive-test logic.
Middleware / Fallback Path
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
Propagates KeyTestNames through cache key and GetDataFromCacheOrGenerate invocation so fallback/fetch paths include key-test filtering.
API Types
pkg/apis/api/componentreport/reqopts/types.go
Adds public KeyTestNames []string to Advanced request options with documentation explaining exclusive-key-test semantics.
Configuration
config/views.yaml
Replaces prior view with new 4.22-main-mass-failure component_readiness entry including key_test_names, expanded variant matrix, advanced options, and metrics/regression config.
Server Formatting
pkg/sippyserver/server.go
Minor whitespace change after utils.ParseComponentReportRequest calls (no behavior change).

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Server as SippyServer
  participant Middleware as ReleaseFallbackMiddleware
  participant QueryGen as QueryGenerator
  participant BQ as BigQuery
  participant Cache as Cache

  Client->>Server: HTTP request (includes Advanced.KeyTestNames)
  Server->>Middleware: Parse request -> pass req options
  Middleware->>Cache: getCacheKey(reqOptions with KeyTestNames)
  alt cache miss / generate
    Middleware->>QueryGen: BuildComponentReportQuery(..., keyTestNames)
    QueryGen->>BQ: Execute generated SQL (includes key_test_priorities CTE)
    BQ-->>QueryGen: query results
    QueryGen->>Cache: store results
    Cache-->>Middleware: results
  else cache hit
    Cache-->>Middleware: cached results
  end
  Middleware-->>Server: component report data
  Server-->>Client: HTTP response (filtered by key tests)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (3 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ⚠️ Warning json.Marshal error ignored in PreAnalysis method without justification, potentially masking cache key generation failures Capture and check json.Marshal error, wrap with fmt.Errorf using %w for context preservation
Single Responsibility And Clear Naming ⚠️ Warning BuildComponentReportQuery violates single responsibility by handling multiple concerns: query construction, CTE generation, priority filtering logic, and parameter management. Extract key-test-priority CTE construction and priority-filtering WHERE clause into dedicated helper methods to keep BuildComponentReportQuery focused on single abstraction level.
Sql Injection Prevention ❓ Inconclusive Unable to execute shell commands to retrieve code context. The function calls attempted to access specific lines from a Go source file but cannot be executed in this environment. Please provide the actual code content or file excerpts you want analyzed, or clarify what verification check should be performed.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'TRT-2471: Consider mass failures during CR query' accurately captures the main change: implementing mass-failure detection logic in component readiness (CR) query generation using key tests to filter regressions.
Excessive Css In React Should Use Styles ✅ Passed PR contains only Go source files and YAML configs; no React components present, making this check not applicable.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

Release Notes

  • New Features
  • Added --exclude-mass-failures command-line flag to filter out tests that commonly fail across multiple components, improving accuracy of component readiness reports
  • Enhanced component readiness query engine to support exclusive test filtering for more granular control over reliability metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go (1)

541-567: ⚠️ Potential issue | 🟡 Minor

Remove unused exclusiveTestNames parameter from FetchTestStatusResults.

The exclusiveTestNames parameter is declared in the function signature but never used in the function body. Query-level filtering for exclusive test names is already handled by BuildComponentReportQuery (which incorporates the filtering into the SQL query itself), making this parameter redundant. Either remove the parameter or document why it's retained.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`
around lines 541 - 567, The FetchTestStatusResults function's signature includes
an unused exclusiveTestNames parameter; remove that parameter from the function
declaration (query.FetchTestStatusResults) and then update all call sites (e.g.,
this file's fallbackTestQueryGenerator.getTestFallbackRelease where it's called
as query.FetchTestStatusResults(ctx, baseQuery,
f.ReqOptions.AdvancedOption.ExclusiveTestNames)) to call the new two-argument
form query.FetchTestStatusResults(ctx, baseQuery). Ensure you update any other
references, associated unit tests, and imports accordingly so compilation
succeeds.
🧹 Nitpick comments (2)
pkg/api/componentreadiness/queryparamparser_test.go (1)

405-405: Consider adding test coverage for non-nil exclusiveTestNames.

The test correctly passes nil for backward compatibility, but there are no test cases verifying behavior when exclusiveTestNames is provided. Consider adding a test case to verify opts.AdvancedOption.ExclusiveTestNames is correctly populated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/queryparamparser_test.go` at line 405, Add a unit
test in queryparamparser_test.go that calls utils.ParseComponentReportRequest
with a non-nil exclusiveTestNames slice (instead of nil) and asserts that the
returned options.AdvancedOption.ExclusiveTestNames is populated with the same
values; specifically exercise ParseComponentReportRequest (the call in the
existing test that currently passes nil) and add assertions on the returned
options.AdvancedOption.ExclusiveTestNames to confirm it contains the expected
test names and preserves order/contents.
pkg/api/componentreadiness/query/querygenerators.go (1)

670-707: Remove unused exclusiveTestNames parameter from FetchTestStatusResults function signature.

The exclusiveTestNames parameter is accepted but never referenced in the function body. Exclusive test filtering is handled at query construction time in BuildComponentReportQuery, so this parameter serves no purpose in FetchTestStatusResults. Removing it will simplify the API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 670 - 707,
The FetchTestStatusResults function accepts an unused exclusiveTestNames
parameter; remove exclusiveTestNames from the function signature of
FetchTestStatusResults and update all call sites to stop passing that argument
(the query already handles exclusivity in BuildComponentReportQuery), keeping
the function body unchanged other than the signature; ensure any imports or
interfaces that referenced the old signature are updated accordingly so
compilation succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 50-51: Remove the duplicated comment line "// partition." that
appears immediately before the dedupedJunitTable declaration; locate the comment
just above the dedupedJunitTable variable in querygenerators.go and delete the
extra duplicate so only a single "// partition." comment remains.

---

Outside diff comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 541-567: The FetchTestStatusResults function's signature includes
an unused exclusiveTestNames parameter; remove that parameter from the function
declaration (query.FetchTestStatusResults) and then update all call sites (e.g.,
this file's fallbackTestQueryGenerator.getTestFallbackRelease where it's called
as query.FetchTestStatusResults(ctx, baseQuery,
f.ReqOptions.AdvancedOption.ExclusiveTestNames)) to call the new two-argument
form query.FetchTestStatusResults(ctx, baseQuery). Ensure you update any other
references, associated unit tests, and imports accordingly so compilation
succeeds.

---

Nitpick comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 670-707: The FetchTestStatusResults function accepts an unused
exclusiveTestNames parameter; remove exclusiveTestNames from the function
signature of FetchTestStatusResults and update all call sites to stop passing
that argument (the query already handles exclusivity in
BuildComponentReportQuery), keeping the function body unchanged other than the
signature; ensure any imports or interfaces that referenced the old signature
are updated accordingly so compilation succeeds.

In `@pkg/api/componentreadiness/queryparamparser_test.go`:
- Line 405: Add a unit test in queryparamparser_test.go that calls
utils.ParseComponentReportRequest with a non-nil exclusiveTestNames slice
(instead of nil) and asserts that the returned
options.AdvancedOption.ExclusiveTestNames is populated with the same values;
specifically exercise ParseComponentReportRequest (the call in the existing test
that currently passes nil) and add assertions on the returned
options.AdvancedOption.ExclusiveTestNames to confirm it contains the expected
test names and preserves order/contents.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller February 23, 2026 17:16
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't immediately see how to test this, I tried adding the tests to my view but it didn't take effect. I'm going to pause review here, lets get this updated so it actually works our of the box, drop the cli arg, and setup 4.22-main to have our three tests configured.

fs.StringVar(&f.ComponentReadinessViewsFile, "views", "", "Optional yaml file for predefined Component Readiness views")
fs.DurationVar(&f.CRTimeRoundingFactor, "component-readiness-time-rounding-factor", defaultCRTimeRoundingFactor, factorUsage)
fs.StringVar(&f.CORSAllowedOrigin, "cors-allowed-origin", "*", "Optional allowed origin for CORS")
fs.BoolVar(&f.ExcludeMassFailures, "exclude-mass-failures", false, "Exclude other tests from jobs containing tests known to cause mass failures")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip the cli flag, this should just be an attribute of the view and nothing else IMO. If the view provides exclusive tests names, enable the collapse functionality.

Please update 4.22-main and the rhcos10 views to use this.

IgnoreDisruption bool `json:"ignore_disruption" yaml:"ignore_disruption"`
FlakeAsFailure bool `json:"flake_as_failure" yaml:"flake_as_failure"`
IncludeMultiReleaseAnalysis bool `json:"include_multi_release_analysis" yaml:"include_multi_release_analysis"`
ExclusiveTestNames []string `json:"exclusive_test_names,omitempty" yaml:"exclusive_test_names,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a better name or a clear godoc, or possibly both. Very difficult for someone to know what this does based on the current name. I almost prefer KeyTestNames, with a description that if these tests fail, no other failures in the job will count as regressions.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added ability to filter component readiness reports by specific test names with priority-based selection to focus on critical test results.

  • Added new component readiness configuration variant for enhanced release tracking.

  • Tests

  • Expanded test coverage for exclusive test filtering and query execution logic.

  • Chores

  • Enhanced configuration validation for component readiness options.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@xueqzhan
Copy link
Contributor Author

@dgoodwin The latest change created a new view 4.22-main-mass-failure and moved the key test list to this view.

In my test, it cuts the current 209 unresolved regressions to 176. As of now, this mostly affects install. We do not yet have many jobs with the new mass failure junit.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go (1)

515-525: ⚠️ Potential issue | 🟠 Major

Cache key omits KeyTestNames — fallback results may be served cross-view.

fallbackTestQueryGeneratorCacheKey includes IgnoreDisruption from AdvancedOption but not KeyTestNames. Since 4.22-main and 4.22-main-mass-failure share identical base/variant config and differ only in key_test_names, they will hash to the same cache key. One view's cached fallback data will be incorrectly served to the other.

🐛 Proposed fix
 type fallbackTestQueryGeneratorCacheKey struct {
 	BaseRelease string
 	BaseStart   time.Time
 	BaseEnd     time.Time
 	// IgnoreDisruption is the only field within AdvancedOption that is used here
 	IgnoreDisruption bool
+	KeyTestNames     []string
 	IncludeVariants  map[string][]string
 	VariantDBGroupBy sets.String
 }

And in getCacheKey():

 func (f *fallbackTestQueryGenerator) getCacheKey() fallbackTestQueryGeneratorCacheKey {
 	return fallbackTestQueryGeneratorCacheKey{
 		BaseRelease:      f.BaseRelease,
 		BaseStart:        f.BaseStart,
 		BaseEnd:          f.BaseEnd,
 		IgnoreDisruption: f.ReqOptions.AdvancedOption.IgnoreDisruption,
+		KeyTestNames:     f.ReqOptions.AdvancedOption.KeyTestNames,
 		IncludeVariants:  f.ReqOptions.VariantOption.IncludeVariants,
 		VariantDBGroupBy: f.ReqOptions.VariantOption.DBGroupBy,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`
around lines 515 - 525, The cache key struct fallbackTestQueryGeneratorCacheKey
currently omits AdvancedOption.KeyTestNames which causes different views (e.g.,
4.22-main vs 4.22-main-mass-failure) to collide; add a KeyTestNames field (e.g.,
KeyTestNames []string) to fallbackTestQueryGeneratorCacheKey and update
getCacheKey() to include KeyTestNames when building the key — ensure you
canonicalize the slice (sort and dedupe if necessary, or convert to a
deterministic string) before concatenation so the generated cache key is stable
and distinct per unique set/order of key test names.
♻️ Duplicate comments (1)
pkg/api/componentreadiness/query/querygenerators.go (1)

50-51: Duplicate comment line // partition. on line 50.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 50 - 51,
Remove the duplicate comment line "// partition." that appears immediately above
the dedupedJunitTable declaration in querygenerators.go; keep a single, clear
comment (or remove it entirely if redundant) near the dedupedJunitTable variable
to avoid repeated comments and ensure only one occurrence of the explanatory
comment remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/views.yaml`:
- Around line 173-180: Confirm whether the mass-failure view should
independently auto-file Jira tickets: specifically review the
regression_tracking.enabled and automate_jira.enabled keys in this new
mass-failure view and decide one of two fixes — either set one of these flags to
false to prevent duplicate ticket creation across the two views (disable
automate_jira.enabled or regression_tracking.enabled in the mass-failure view),
or add deduplication logic in the Jira automation pipeline to ignore duplicates
when both views detect the same regression; update the mass-failure view
configuration accordingly and, if choosing deduplication, implement the check in
the Jira automation flow that references view id/name and failing test signature
to avoid duplicate tickets.

In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Line 671: Remove the unused keyTestNames parameter from
FetchTestStatusResults: update the function signature in querygenerators.go to
no longer accept keyTestNames, remove any references to it in the function body
(there are none), and update all call sites (notably in querygenerators.go and
releasefallback.go) to call FetchTestStatusResults(ctx, query) with the new
signature; re-run compilation/tests and fix any import or variable mismatches
caused by the changed call sites.
- Around line 332-348: The CTE key_test_priorities currently filters raw junit
rows with "success_val = 0" which will include flaked failures; update the WHERE
clause inside the key_test_priorities CTE in querygenerators.go to exclude
flakes (add "AND flake_count = 0") or switch the CTE to read from
dedupedJunitTable instead of the raw junit table so the dedup logic is applied
(modify the CTE that selects from "%s.%s AS junit" named in the WITH block and
adjust parameters accordingly).

---

Outside diff comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 515-525: The cache key struct fallbackTestQueryGeneratorCacheKey
currently omits AdvancedOption.KeyTestNames which causes different views (e.g.,
4.22-main vs 4.22-main-mass-failure) to collide; add a KeyTestNames field (e.g.,
KeyTestNames []string) to fallbackTestQueryGeneratorCacheKey and update
getCacheKey() to include KeyTestNames when building the key — ensure you
canonicalize the slice (sort and dedupe if necessary, or convert to a
deterministic string) before concatenation so the generated cache key is stable
and distinct per unique set/order of key test names.

---

Duplicate comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 50-51: Remove the duplicate comment line "// partition." that
appears immediately above the dedupedJunitTable declaration in
querygenerators.go; keep a single, clear comment (or remove it entirely if
redundant) near the dedupedJunitTable variable to avoid repeated comments and
ensure only one occurrence of the explanatory comment remains.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7c39f99 and 19eea47.

📒 Files selected for processing (7)
  • config/views.yaml
  • pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
  • pkg/api/componentreadiness/query/querygenerators.go
  • pkg/api/componentreadiness/query/querygenerators_test.go
  • pkg/apis/api/componentreport/reqopts/types.go
  • pkg/flags/component_readiness.go
  • pkg/sippyserver/server.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/sippyserver/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/api/componentreadiness/query/querygenerators_test.go
  • pkg/flags/component_readiness.go

Comment on lines +332 to +348
if len(keyTestNames) > 0 {
// Create a CTE that identifies the highest priority (lowest index) key test in each job
// This ensures when multiple key tests appear in the same job, only the highest priority one is used
withClause = fmt.Sprintf(`WITH key_test_priorities AS (
SELECT
prowjob_build_id,
test_name,
-- Find the index/priority of each test (lower index = higher priority)
CASE
%s
END AS test_priority
FROM %s.%s AS junit
WHERE modified_time >= DATETIME(@From)
AND modified_time < DATETIME(@To)
AND test_name IN UNNEST(@KeyTestNames)
AND success_val = 0
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

key_test_priorities may suppress jobs where a key test merely flaked.

The CTE filters success_val = 0 against the raw (non-deduped) table. For a key test that flaked, the raw table contains a failure row with success_val = 0, so the job would still be matched. The deduped table (dedupedJunitTable) resolves this by preferring the flake row, but the CTE bypasses that logic entirely.

If a key test flakes, should the remaining tests in that job be suppressed from regression analysis? If not, add AND flake_count = 0 to the CTE filter (or consider querying the deduped table instead):

💡 Suggested fix
 					FROM %s.%s AS junit
 					WHERE modified_time >= DATETIME(`@From`)
 					AND modified_time < DATETIME(`@To`)
 					AND test_name IN UNNEST(`@KeyTestNames`)
-					AND success_val = 0
+					AND success_val = 0
+					AND flake_count = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 332 - 348,
The CTE key_test_priorities currently filters raw junit rows with "success_val =
0" which will include flaked failures; update the WHERE clause inside the
key_test_priorities CTE in querygenerators.go to exclude flakes (add "AND
flake_count = 0") or switch the CTE to read from dedupedJunitTable instead of
the raw junit table so the dedup logic is applied (modify the CTE that selects
from "%s.%s AS junit" named in the WITH block and adjust parameters
accordingly).

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added key test filtering capability with priority-based handling for component readiness analysis

  • Introduced new configuration variant for enhanced test status reporting

  • Tests

  • Added comprehensive unit tests for exclusive test filtering and priority-based test logic validation

  • Chores

  • Minor code formatting adjustments

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
pkg/api/componentreadiness/query/querygenerators.go (3)

50-50: ⚠️ Potential issue | 🟡 Minor

Remove duplicated comment line.

Line 50 repeats the prior line’s “partition.” text and looks like an edit artifact.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` at line 50, Remove the
duplicated comment line "partition." in
pkg/api/componentreadiness/query/querygenerators.go by editing the comment block
around the partition-related documentation (the comment immediately above the
code in querygenerators.go) so only a single "partition." line remains; simply
delete the repeated line to eliminate the edit artifact and leave the
surrounding comment text intact.

671-671: Drop unused keyTestNames parameter from FetchTestStatusResults.

The parameter is not used in the function body, so it expands API surface without behavior.

Proposed fix
-func FetchTestStatusResults(ctx context.Context, query *bigquery.Query, keyTestNames []string) (map[string]bq.TestStatus, []error) {
+func FetchTestStatusResults(ctx context.Context, query *bigquery.Query) (map[string]bq.TestStatus, []error) {
-	baseStatus, baseErrs := FetchTestStatusResults(ctx, baseQuery, b.ReqOptions.AdvancedOption.KeyTestNames)
+	baseStatus, baseErrs := FetchTestStatusResults(ctx, baseQuery)
-	sampleStatus, sampleErrs := FetchTestStatusResults(ctx, sampleQuery, s.ReqOptions.AdvancedOption.KeyTestNames)
+	sampleStatus, sampleErrs := FetchTestStatusResults(ctx, sampleQuery)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` at line 671, The
function FetchTestStatusResults has an unused parameter keyTestNames; remove
this parameter from its signature and all call sites to reduce API surface.
Update the function declaration in querygenerators.go to take only (ctx
context.Context, query *bigquery.Query), then search for every caller that
passes keyTestNames to FetchTestStatusResults and remove that argument. Run and
update any affected unit tests or interfacing code to reflect the new signature.

332-348: ⚠️ Potential issue | 🟠 Major

Exclude flaked key tests from failure-priority CTE.

Line 347 treats any success_val = 0 key-test row as a hard failure in raw junit data. That can include flaked attempts and incorrectly suppress other failures in the same job.

Proposed fix
 							FROM %s.%s AS junit
 							WHERE modified_time >= DATETIME(`@From`)
 							AND modified_time < DATETIME(`@To`)
 							AND test_name IN UNNEST(`@KeyTestNames`)
 							AND success_val = 0
+							AND flake_count = 0
 						),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 332 - 348,
The CTE key_test_priorities currently treats any success_val = 0 row as a hard
failure and thus can pick flaked attempts; update the WHERE clause inside the
key_test_priorities CTE (the SELECT from %s.%s AS junit that computes
test_priority) to also exclude flaked attempts by adding a condition that
filters out the flake-marked rows (e.g., AND (is_flaky = FALSE OR flaked = 0) or
the appropriate schema field that indicates a flake) so only genuine non-flaky
failures are considered when computing test_priority.
🧹 Nitpick comments (1)
pkg/api/componentreadiness/query/querygenerators_test.go (1)

106-108: Add coverage that flaked key tests are not treated as failed key tests.

Given the key-test suppression logic, add an assertion that the CTE excludes flakes (e.g., flake_count = 0) so this behavior can’t regress silently.

Proposed test assertion update
 				assert.Contains(t, commonQuery, "AND success_val = 0",
 					"CTE should only identify jobs where key tests FAILED (success_val = 0)")
+				assert.Contains(t, commonQuery, "AND flake_count = 0",
+					"CTE should exclude flaked key-test rows from failure-priority selection")

Also applies to: 216-218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators_test.go` around lines 106 -
108, Add assertions in querygenerators_test.go that the generated CTE excludes
flaked key tests by checking the query string variable commonQuery contains the
flake filter; specifically, add assert.Contains(t, commonQuery, "AND flake_count
= 0", "CTE should exclude flaked key tests (flake_count = 0)") at the same two
places where you already assert success_val and KeyTestNames (i.e., adjacent to
the existing checks on commonQuery at the first and second occurrences).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 542-545: The fallback cache key generators
fallbackTestQueryGeneratorCacheKey and
fallbackTestQueryReleasesGeneratorCacheKey are missing
AdvancedOption.KeyTestNames, causing cached queries built by
BuildComponentReportQuery to be reused incorrectly; update both key builders to
incorporate the KeyTestNames value(s) from f.ReqOptions.AdvancedOption (e.g.,
serialize or join the slice) into the key composition so different KeyTestNames
produce distinct cache keys wherever query.BuildComponentReportQuery is called.

---

Duplicate comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Line 50: Remove the duplicated comment line "partition." in
pkg/api/componentreadiness/query/querygenerators.go by editing the comment block
around the partition-related documentation (the comment immediately above the
code in querygenerators.go) so only a single "partition." line remains; simply
delete the repeated line to eliminate the edit artifact and leave the
surrounding comment text intact.
- Line 671: The function FetchTestStatusResults has an unused parameter
keyTestNames; remove this parameter from its signature and all call sites to
reduce API surface. Update the function declaration in querygenerators.go to
take only (ctx context.Context, query *bigquery.Query), then search for every
caller that passes keyTestNames to FetchTestStatusResults and remove that
argument. Run and update any affected unit tests or interfacing code to reflect
the new signature.
- Around line 332-348: The CTE key_test_priorities currently treats any
success_val = 0 row as a hard failure and thus can pick flaked attempts; update
the WHERE clause inside the key_test_priorities CTE (the SELECT from %s.%s AS
junit that computes test_priority) to also exclude flaked attempts by adding a
condition that filters out the flake-marked rows (e.g., AND (is_flaky = FALSE OR
flaked = 0) or the appropriate schema field that indicates a flake) so only
genuine non-flaky failures are considered when computing test_priority.

---

Nitpick comments:
In `@pkg/api/componentreadiness/query/querygenerators_test.go`:
- Around line 106-108: Add assertions in querygenerators_test.go that the
generated CTE excludes flaked key tests by checking the query string variable
commonQuery contains the flake filter; specifically, add assert.Contains(t,
commonQuery, "AND flake_count = 0", "CTE should exclude flaked key tests
(flake_count = 0)") at the same two places where you already assert success_val
and KeyTestNames (i.e., adjacent to the existing checks on commonQuery at the
first and second occurrences).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 19eea47 and 316eaf3.

📒 Files selected for processing (6)
  • config/views.yaml
  • pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
  • pkg/api/componentreadiness/query/querygenerators.go
  • pkg/api/componentreadiness/query/querygenerators_test.go
  • pkg/apis/api/componentreport/reqopts/types.go
  • pkg/sippyserver/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/sippyserver/server.go
  • config/views.yaml

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 26, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

  • New Features

  • Added key-test filtering with priority-based handling for component readiness reports.

  • Added a new "4.22-main-mass-failure" component readiness view/configuration for enhanced reporting.

  • Tests

  • Added unit tests validating exclusive key-test filtering and priority-based behavior.

  • Chores

  • Minor code formatting adjustments.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

@xueqzhan: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants