[rb] ensure the grid is properly restarted in tests when there is a problem #16842

titusfortner · 2026-01-04T21:55:27Z

User description

We sometimes get test failures if the grid dies and it affects all tests after that. This should isolate the problem to the one place it happens, or if it is expected, will properly refresh.

💥 What does this PR do?

Add method to server class to verify process & socket and then ping the status endpoint with 2 second timeout so it won't hang
Add #ensure_grid method to test_environment to make sure the grid is good every time a remote driver is created, and before(:suite)

🔧 Implementation Notes

This gets called on every #reset_driver! or when driver is used after a #quit_driver; By default it happens at the end of every failing test.

💡 Additional Considerations

I'm seeing a bunch of hard coded driver instantiation in the test code we need to remove

🔄 Types of changes

Bug fix (backwards compatible)
New feature (non-breaking change which adds functionality and tests!)

PR Type

Bug fix, Enhancement

Description

Add status_ok? method to verify grid health via process, socket, and HTTP status endpoint
Implement ensure_grid method to automatically restart grid before remote driver creation
Improve error handling in reset_remote_server with try-catch-finally pattern
Call ensure_grid in remote_driver and before(:suite) to prevent cascading test failures

Diagram Walkthrough

flowchart LR
  A["Test Suite Start"] -->|before:suite| B["ensure_grid"]
  C["Remote Driver Creation"] -->|remote_driver| B
  B -->|check status| D["status_ok?"]
  D -->|unhealthy| E["reset_remote_server"]
  E -->|start| F["Grid Running"]
  D -->|healthy| F

File Walkthrough

Relevant files

Enhancement

server.rb `Add grid health status verification method` rb/lib/selenium/server.rb Add `status_ok?` method to verify grid health by checking process alive status, socket connection, and HTTP status endpoint Includes 2-second timeout on HTTP request to prevent hanging Returns false on any StandardError to handle edge cases gracefully	+11/-0
spec_helper.rb `Use ensure_grid in test suite setup` rb/spec/integration/selenium/webdriver/spec_helper.rb Replace `remote_server.start` with `ensure_grid` call in `before(:suite)` hook Ensures grid is properly initialized and healthy before test suite runs	+1/-1
test_environment.rb `Add grid health checks and auto-restart logic` rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb Wrap `remote_server.stop` in error handling to prevent failures from blocking cleanup Change `remote_server?` to check `status_ok?` instead of nil check for actual health verification Add `ensure_grid` method that restarts grid if it's not healthy Call `ensure_grid` in `remote_driver` method before creating remote driver instances	+15/-3

Documentation

server.rbs `Update type signatures for status_ok?` rb/sig/lib/selenium/server.rbs Add type signature for new `status_ok?` method returning boolean	+2/-0

qodo-code-review · 2026-01-04T21:55:52Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Swallowed exception: The new `status_ok?` method rescues `StandardError` and returns `false` without logging any context, making grid health check failures difficult to diagnose. Referred Code def status_ok? return false unless @process&.alive? && socket.connected? Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do \|http\| response = http.get('/wd/hub/status') response.is_a?(Net::HTTPSuccess) end rescue StandardError false end Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Log data review: The new warning log interpolates `e.message`, which may include environment- or network-specific details, and should be reviewed to ensure it cannot leak sensitive information in CI logs. Referred Code begin @remote_server&.stop rescue StandardError => e WebDriver.logger.warn("Remote server stop failed: #{e.class}: #{e.message}") ensure @remote_server = nil end Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-01-04T21:57:33Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Use @socket instead of undefined socket Fix an undefined variable error in the server health check by using the `@socket` instance variable instead of the local `socket` variable. rb/lib/selenium/server.rb [223-228] -return false unless @process&.alive? && socket.connected? +return false unless @process&.alive? && @socket&.connected? Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do \|http\| response = http.get('/wd/hub/status') response.is_a?(Net::HTTPSuccess) end Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a `NameError` because `socket` is undefined. The proposed fix to use `@socket&.connected?` is incorrect as `@socket` is not an instance variable; however, it correctly points to the line with the bug and the nature of the problem, which is a critical flaw in the new health check feature.	High
	Avoid creating an unstarted server instance Modify `reset_remote_server` to avoid creating a new, unstarted server instance by removing the final `remote_server` call. rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb [127-136] def reset_remote_server begin @remote_server&.stop rescue StandardError => e WebDriver.logger.warn("Remote server stop failed: #{e.class}: #{e.message}") ensure @remote_server = nil end - remote_server end Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that `reset_remote_server` creates a new, unstarted server instance, which is not the method's intended behavior. Removing the final `remote_server` call fixes this logical bug and improves the test setup's reliability.	Medium
	Fix incorrect method chaining on nil Fix a potential `NoMethodError` in `ensure_grid` by separating the server reset and start calls onto different lines. rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb [142-146] def ensure_grid return if remote_server? - reset_remote_server.start + reset_remote_server + remote_server.start end Apply / Chat Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a `NoMethodError` that would occur due to incorrect method chaining. The proposed change to separate the `reset_remote_server` and `remote_server.start` calls is essential for the `ensure_grid` method to function correctly.	Medium
Learned best practice	Narrow exception handling scope Rescue only the expected network/HTTP exceptions (timeouts/connection errors) so real bugs (e.g., `NoMethodError`) don’t get silently converted into `false`; optionally log unexpected exceptions. rb/lib/selenium/server.rb [222-231] def status_ok? return false unless @process&.alive? && socket.connected? Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do \|http\| response = http.get('/wd/hub/status') response.is_a?(Net::HTTPSuccess) end -rescue StandardError +rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED, Errno::EHOSTUNREACH, SocketError, IOError false end Apply / Chat Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Avoid rescuing broad `StandardError` at integration boundaries; instead handle expected failures explicitly (and/or log) so unexpected errors aren’t silently masked.	Low
Update

.github/workflows/bazel.yml

…roblem

titusfortner requested a review from p0deje January 4, 2026 21:55

selenium-ci added the C-rb Ruby Bindings label Jan 4, 2026

titusfortner requested a review from aguspe January 4, 2026 21:55

qodo-code-review bot added the Review effort 2/5 label Jan 4, 2026

titusfortner force-pushed the rb_ensure_grid branch 2 times, most recently from ac7d38d to 56b26d9 Compare January 5, 2026 15:21

p0deje approved these changes Jan 5, 2026

View reviewed changes

.github/workflows/bazel.yml Outdated Show resolved Hide resolved

.github/workflows/bazel.yml Outdated Show resolved Hide resolved

titusfortner added 2 commits January 5, 2026 11:01

[rb] ensure the grid is properly restarted in tests when there is a p…

c5279e9

…roblem

more robust protection for driver issues

840429d

titusfortner force-pushed the rb_ensure_grid branch from 56b26d9 to 840429d Compare January 5, 2026 17:02

titusfortner merged commit 8b5aed0 into trunk Jan 5, 2026
33 checks passed

titusfortner deleted the rb_ensure_grid branch January 5, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[rb] ensure the grid is properly restarted in tests when there is a problem #16842

[rb] ensure the grid is properly restarted in tests when there is a problem #16842

titusfortner commented Jan 4, 2026 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[rb] ensure the grid is properly restarted in tests when there is a problem #16842

[rb] ensure the grid is properly restarted in tests when there is a problem #16842

Conversation

titusfortner commented Jan 4, 2026 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

💥 What does this PR do?

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

titusfortner commented Jan 4, 2026 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Jan 4, 2026 •

edited

Loading

qodo-code-review bot commented Jan 4, 2026 •

edited

Loading