[release-4.21] OCPBUGS-77502: test: add EC2 AMI related verification to Windows ImageType test#7830
Conversation
Simplify the test to focus on core validation: - Remove all scaling logic (orthogonal to ImageType validation) - Keep BuildNodePoolManifest to create Windows ImageType NodePool - Framework already waits for nodes to be ready - Run method validates: 1. ValidPlatformImageType condition shows Windows AMI 2. Nodes are running expected RHCOS-based OS (WinLI AMI) The Windows License Included (WinLI) AMI is RHCOS with Windows licensing metadata, so nodes report linux OS with RHCOS image. This addresses review feedback that scaling tests are orthogonal to ImageType validation, and the framework already handles node readiness checks. test: add EC2 AMI verification to NodePoolImageTypeTest Add critical validation layer to verify EC2 instances are actually using the Windows LI AMI, not just that the controller found one. This is the most important validation because: - AWS bills based on the AMI used by EC2 instances - Windows licensing compliance depends on using WinLI AMI - A bug could result in the controller finding the correct AMI but applying a different one to EC2 instances Validation layers (in order of criticality): 1. EC2 instance AMI matches expected Windows LI AMI (NEW) 2. NodePool ValidPlatformImageType condition shows Windows AMI 3. Nodes boot successfully with RHCOS The EC2 verification queries AWS API to get the actual ImageId of each instance and compares it to the expected Windows LI AMI extracted from the NodePool condition message. This addresses the root requirement from OCPSTRAT-1949: customers running Windows VMs within OCP-Virt on ROSA-HCP must use nodes with Windows LI metadata to be compliant with AWS EC2 LI terms. test: wait for settled ValidPlatformImageType condition status Fix race condition where test would proceed while condition Status is still Unknown. The wait predicate now explicitly checks for settled status (True or False) rather than treating empty Status as wildcard. The previous implementation used ConditionPredicate with only Type set, which matches any Status including Unknown due to wildcard behavior in the Matches() function. This caused the test to continue before the controller completed AMI discovery. The fix uses a custom predicate that: - Waits for condition Type to exist - Ensures Status is True or False (settled state) - Rejects Status=Unknown (controller still discovering) This ensures deterministic test results and aligns with the test logic that expects Status to be either True (AMI found) or False (AMI not available), never Unknown. Addresses review comment on PR openshift#7680. test: improve EC2 instance validation robustness Address code review feedback to improve error handling and validation: 1. Add context cancellation check in EC2 validation loop - Prevents unnecessary AWS API calls if context is cancelled - Provides early exit with clear error message 2. Safer providerID parsing with validation - Replace unsafe string slicing with strings.Split - Validate providerID format before extraction - Check for empty instance ID after extraction - Add clear error messages for malformed providerIDs 3. Stricter AMI ID regex pattern - Match exactly 8 hex chars (old format: ami-12345678) - Or exactly 17 hex chars (new format: ami-0abcdef1234567890) - Previous pattern matched 8-17 chars, allowing invalid lengths These changes improve test reliability and provide better error messages when validation fails. test: fix import alias for metav1 types Fix type mismatch error where v1.ConditionTrue was being compared with condition.Status (metav1.ConditionStatus type). Changed import alias from 'v1' to 'metav1' for consistency and updated all references to use the correct type names: - v1.ObjectMeta -> metav1.ObjectMeta - v1.ConditionTrue -> metav1.ConditionTrue - v1.ConditionFalse -> metav1.ConditionFalse This fixes the CI error: invalid operation: condition.Status == v1.ConditionTrue (mismatched types corev1.ConditionStatus and metav1.ConditionStatus) test: use corev1.ConditionStatus constants for NodePool condition check Fix type mismatch error where NodePool condition.Status (corev1.ConditionStatus) was being compared with metav1.ConditionTrue/False (metav1.ConditionStatus). The hyperv1.NodePool.Status.Conditions use corev1.Condition type, so we must compare with corev1.ConditionTrue and corev1.ConditionFalse constants. test: use t.Skip when Windows AMI unavailable When Windows AMI is not available for a region/version combination, use t.Skipf instead of logging and returning. This properly marks the test as skipped rather than passed, which is semantically correct since the test cannot verify functionality when preconditions aren't met. This improves CI/CD metrics by distinguishing between: - Test passed: functionality verified - Test skipped: preconditions not met
|
@openshift-cherrypick-robot: Ignoring requests to cherry-pick non-bug issues: CNTRLPLANE-753 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/lgtm |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/auto-cc |
|
/retitle [release-4.21] OCPBUGS-77502: test: add EC2 AMI related verification to Windows ImageType test |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-77502, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@wangke19: This pull request references Jira Issue OCPBUGS-77502, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/close |
|
@wangke19: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-77502. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is an automated cherry-pick of #7680
/assign wangke19