Skip to content

Conversation

@titusfortner
Copy link
Member

@titusfortner titusfortner commented Jan 4, 2026

User description

We sometimes get test failures if the grid dies and it affects all tests after that. This should isolate the problem to the one place it happens, or if it is expected, will properly refresh.

💥 What does this PR do?

  • Add method to server class to verify process & socket and then ping the status endpoint with 2 second timeout so it won't hang
  • Add #ensure_grid method to test_environment to make sure the grid is good every time a remote driver is created, and before(:suite)

🔧 Implementation Notes

  • This gets called on every #reset_driver! or when driver is used after a #quit_driver; By default it happens at the end of every failing test.

💡 Additional Considerations

I'm seeing a bunch of hard coded driver instantiation in the test code we need to remove

🔄 Types of changes

  • Bug fix (backwards compatible)
  • New feature (non-breaking change which adds functionality and tests!)

PR Type

Bug fix, Enhancement


Description

  • Add status_ok? method to verify grid health via process, socket, and HTTP status endpoint

  • Implement ensure_grid method to automatically restart grid before remote driver creation

  • Improve error handling in reset_remote_server with try-catch-finally pattern

  • Call ensure_grid in remote_driver and before(:suite) to prevent cascading test failures


Diagram Walkthrough

flowchart LR
  A["Test Suite Start"] -->|before:suite| B["ensure_grid"]
  C["Remote Driver Creation"] -->|remote_driver| B
  B -->|check status| D["status_ok?"]
  D -->|unhealthy| E["reset_remote_server"]
  E -->|start| F["Grid Running"]
  D -->|healthy| F
Loading

File Walkthrough

Relevant files
Enhancement
server.rb
Add grid health status verification method                             

rb/lib/selenium/server.rb

  • Add status_ok? method to verify grid health by checking process alive
    status, socket connection, and HTTP status endpoint
  • Includes 2-second timeout on HTTP request to prevent hanging
  • Returns false on any StandardError to handle edge cases gracefully
+11/-0   
spec_helper.rb
Use ensure_grid in test suite setup                                           

rb/spec/integration/selenium/webdriver/spec_helper.rb

  • Replace remote_server.start with ensure_grid call in before(:suite)
    hook
  • Ensures grid is properly initialized and healthy before test suite
    runs
+1/-1     
test_environment.rb
Add grid health checks and auto-restart logic                       

rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb

  • Wrap remote_server.stop in error handling to prevent failures from
    blocking cleanup
  • Change remote_server? to check status_ok? instead of nil check for
    actual health verification
  • Add ensure_grid method that restarts grid if it's not healthy
  • Call ensure_grid in remote_driver method before creating remote driver
    instances
+15/-3   
Documentation
server.rbs
Update type signatures for status_ok?                                       

rb/sig/lib/selenium/server.rbs

  • Add type signature for new status_ok? method returning boolean
+2/-0     

@titusfortner titusfortner requested a review from p0deje January 4, 2026 21:55
@selenium-ci selenium-ci added the C-rb Ruby Bindings label Jan 4, 2026
@titusfortner titusfortner requested a review from aguspe January 4, 2026 21:55
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 4, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Swallowed exception: The new status_ok? method rescues StandardError and returns false without logging any
context, making grid health check failures difficult to diagnose.

Referred Code
def status_ok?
  return false unless @process&.alive? && socket.connected?

  Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do |http|
    response = http.get('/wd/hub/status')
    response.is_a?(Net::HTTPSuccess)
  end
rescue StandardError
  false
end

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Log data review: The new warning log interpolates e.message, which may include environment- or
network-specific details, and should be reviewed to ensure it cannot leak sensitive
information in CI logs.

Referred Code
begin
  @remote_server&.stop
rescue StandardError => e
  WebDriver.logger.warn("Remote server stop failed: #{e.class}: #{e.message}")
ensure
  @remote_server = nil
end

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 4, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use @socket instead of undefined socket

Fix an undefined variable error in the server health check by using the @socket
instance variable instead of the local socket variable.

rb/lib/selenium/server.rb [223-228]

-return false unless @process&.alive? && socket.connected?
+return false unless @process&.alive? && @socket&.connected?
 
 Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do |http|
   response = http.get('/wd/hub/status')
   response.is_a?(Net::HTTPSuccess)
 end
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a NameError because socket is undefined. The proposed fix to use @socket&.connected? is incorrect as @socket is not an instance variable; however, it correctly points to the line with the bug and the nature of the problem, which is a critical flaw in the new health check feature.

High
Avoid creating an unstarted server instance

Modify reset_remote_server to avoid creating a new, unstarted server instance by
removing the final remote_server call.

rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb [127-136]

 def reset_remote_server
   begin
     @remote_server&.stop
   rescue StandardError => e
     WebDriver.logger.warn("Remote server stop failed: #{e.class}: #{e.message}")
   ensure
     @remote_server = nil
   end
-  remote_server
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that reset_remote_server creates a new, unstarted server instance, which is not the method's intended behavior. Removing the final remote_server call fixes this logical bug and improves the test setup's reliability.

Medium
Fix incorrect method chaining on nil

Fix a potential NoMethodError in ensure_grid by separating the server reset and
start calls onto different lines.

rb/spec/integration/selenium/webdriver/spec_support/test_environment.rb [142-146]

 def ensure_grid
   return if remote_server?
 
-  reset_remote_server.start
+  reset_remote_server
+  remote_server.start
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a NoMethodError that would occur due to incorrect method chaining. The proposed change to separate the reset_remote_server and remote_server.start calls is essential for the ensure_grid method to function correctly.

Medium
Learned
best practice
Narrow exception handling scope

Rescue only the expected network/HTTP exceptions (timeouts/connection errors) so
real bugs (e.g., NoMethodError) don’t get silently converted into false;
optionally log unexpected exceptions.

rb/lib/selenium/server.rb [222-231]

 def status_ok?
   return false unless @process&.alive? && socket.connected?
 
   Net::HTTP.start(@host, @port, open_timeout: 2, read_timeout: 2) do |http|
     response = http.get('/wd/hub/status')
     response.is_a?(Net::HTTPSuccess)
   end
-rescue StandardError
+rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED, Errno::EHOSTUNREACH, SocketError, IOError
   false
 end
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Avoid rescuing broad StandardError at integration boundaries; instead handle expected failures explicitly (and/or log) so unexpected errors aren’t silently masked.

Low
  • Update

@titusfortner titusfortner force-pushed the rb_ensure_grid branch 2 times, most recently from ac7d38d to 56b26d9 Compare January 5, 2026 15:21
@titusfortner titusfortner merged commit 8b5aed0 into trunk Jan 5, 2026
33 checks passed
@titusfortner titusfortner deleted the rb_ensure_grid branch January 5, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants