CMP-3825: Add testcase to cover OCP-45421 upstream#1074
CMP-3825: Add testcase to cover OCP-45421 upstream#1074rhmdnd merged 1 commit intoComplianceAsCode:masterfrom
Conversation
|
@Anna-Koudelkova: This pull request references CMP-3825 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
🤖 To deploy this PR, run the following command: |
a09d62a to
e2b7a84
Compare
|
🤖 To deploy this PR, run the following command: |
df1f4a0 to
c2a1ddc
Compare
|
🤖 To deploy this PR, run the following command: |
|
🤖 To deploy this PR, run the following command: |
|
/retest |
|
/test e2e-aws-parallel |
rhmdnd
left a comment
There was a problem hiding this comment.
A few comments inline, specifically around using a smaller profile and bolstering the assertion that the suite remains in a pending state.
| t.Fatal(err) | ||
| } | ||
|
|
||
| // Wait for binding to be deleted |
There was a problem hiding this comment.
As I understand it, ScanSettingBindings own ComplianceSuites, which own ComplianceScans. If we update the logic here to wait for the scans to be cleaned up, we should be safe to move forward recreating the binding.
If we just check the binding, then we run the risk of suites and scans still getting cleaned up when the new binding is created.
c2a1ddc to
cb2eb9f
Compare
|
🤖 To deploy this PR, run the following command: |
rhmdnd
left a comment
There was a problem hiding this comment.
I think I found the spot in the tests that's causing a transient issue with the serial CI.
One serial suite failed on the transient issue we've been chasing down in #1100. But the other serial suite failed the test being added here, and it seems to be timing related.
Made a recommendation in line to see if that will help.
| suite := &compv1alpha1.ComplianceSuite{} | ||
| if err := f.Client.Get(context.TODO(), types.NamespacedName{Name: bindingName, Namespace: f.OperatorNamespace}, suite); err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| t.Fatalf("ComplianceSuite %s not found", bindingName) |
There was a problem hiding this comment.
Interesting - this is getting hit in the serial tests:
2026/03/04 13:46:52 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: LAUNCHING
2026/03/04 13:46:57 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:02 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:07 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:12 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:17 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:22 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:27 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:32 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:37 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:42 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:47 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:52 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:47:57 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: RUNNING
2026/03/04 13:48:02 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: AGGREGATING
2026/03/04 13:48:07 waiting until suite test-strict-node-scan-configuration-binding reaches target status 'DONE'. Current status: AGGREGATING
2026/03/04 13:48:17 ComplianceScan ready (DONE)
2026/03/04 13:48:17 All scans in ComplianceSuite have finished (test-strict-node-scan-configuration-binding)
main_test.go:2471: ComplianceSuite test-strict-node-scan-configuration-binding not found
--- FAIL: TestStrictNodeScanConfiguration (95.19s)
The waiting until suite lines must be coming from 2433. I wonder if we're experiencing this failure because we're creating the binding on line 2460, and then asking for the ComplianceSuite only a few lines later (2469).
I suspect that the ScanSettingBinding controller hasn't created the ComplianceSuite yet, hence why the test can't find it. We could add something like the following to line 2464:
// Wait for the ComplianceSuite to be created by the SSB controller
suite := &compv1alpha1.ComplianceSuite{}
suiteKey := types.NamespacedName{Name: bindingName, Namespace: f.OperatorNamespace}
if err := wait.PollImmediate(framework.RetryInterval, framework.Timeout, func() (bool, error) {
err := f.Client.Get(context.TODO(), suiteKey, suite)
if apierrors.IsNotFound(err) {
return false, nil
}
if err != nil {
return false, err
}
return true, nil
}); err != nil {
t.Fatalf("timed out waiting for ComplianceSuite %s to be created: %v", bindingName, err)
}Then, we could simplify the PENDING loop check (starting on line 2466) to:
deadline := time.Now().Add(30 * time.Second)
for time.Now().Before(deadline) {
if err := f.Client.Get(context.TODO(), suiteKey, suite); err != nil {
t.Fatalf("failed to get ComplianceSuite %s: %v", bindingName, err)
}
if suite.Status.Phase != compv1alpha1.PhasePending {
t.Fatalf("suite left PENDING state (expected to remain PENDING for 30s): phase=%s", suite.Status.Phase)
}
time.Sleep(framework.RetryInterval)
}There was a problem hiding this comment.
Ok, I have rebased and updated it based on your suggestion, so I am curious now how this will show up in the serial tests! It have worked just fine when I run it locally (I always let it run one more time before committing or updating) and I have not seen this behavior before.. Hopefully that's it! 🙂
cb2eb9f to
5d9890b
Compare
|
🤖 To deploy this PR, run the following command: |
5d9890b to
0ff38d6
Compare
|
🤖 To deploy this PR, run the following command: |
rhmdnd
left a comment
There was a problem hiding this comment.
Looks good - my comments are primarily cosmetic or very low concerns given the CI cluster configurations we're using.
Curious to see what happens with the serial tests now.
Thanks for updating!
| } | ||
|
|
||
| func (f *Framework) waitForScanCleanup() error { | ||
| func (f *Framework) WaitForScanCleanup() error { |
There was a problem hiding this comment.
For a minute I thought this was going to be a problem, given this is asserting all scans are cleaned up. I don't think we'd be able to use this in the parallel tests, but it should be fine in the serial tests given the suite is only running one at a time.
| t.Fatalf("suite left PENDING state (expected to remain PENDING for 30s): phase=%s", suite.Status.Phase) | ||
| } | ||
| time.Sleep(framework.RetryInterval) | ||
| } |
There was a problem hiding this comment.
Minor linting fix would be to bump this } in to the same level of indentation as the for. Also, gofmt can help with that automatically if you have it setup to run when you write files in your editor. Purely cosmetic, but can help with style consistency over the long haul.
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| nodeName := workerNodes[0].Name |
There was a problem hiding this comment.
If workerNodes is an empty list, this will panic. Not a huge issue since all our CI clusters should have worker nodes provisioned, but we could update this in a follow-on patch to safeguard against that case:
workerNodes, err := f.GetNodesWithSelector(map[string]string{
"node-role.kubernetes.io/worker": "",
})
if err != nil {
t.Fatal(err)
}
if len(workerNodes) == 0 {
t.Skip("No worker nodes are available to test strict node scanning configuration")
}|
@Anna-Koudelkova: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
rhmdnd
left a comment
There was a problem hiding this comment.
Looks like the parallel testing failed on a metrics flake, which we need to investigate and see if we can fix (this has cropped up in the past, too).
=== NAME TestSingleScanSucceeds
main_test.go:720: failed to assert metrics for scan test-single-scan-succeeds: error getting output exit status 1
--- FAIL: TestSingleScanSucceeds (100.42s)
Otherwise - we got a clean serial run on both architectures.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Anna-Koudelkova, rhmdnd The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Testcase passes on OCP 4.21 cluster: