| layout | default |
|---|---|
| title | Browser Use Tutorial - Chapter 3: Element Selection |
| nav_order | 3 |
| has_children | false |
| parent | Browser Use Tutorial |
Welcome to Chapter 3: Element Selection - Finding and Interacting with Web Elements. In this part of Browser Use Tutorial: AI-Powered Web Automation Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Master advanced techniques for locating and interacting with web elements using vision, DOM analysis, and intelligent selection strategies.
Element selection is crucial for reliable web automation. Browser Use combines vision analysis, DOM parsing, and intelligent heuristics to find and interact with elements. This chapter covers advanced element selection techniques and strategies.
# element_selection_demo.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def element_selection_demo():
"""Demonstrate different element selection approaches"""
agent = Agent(
task="""
Compare vision vs DOM element selection:
1. Go to https://example.com
2. Find the main heading using visual cues (it looks like a large title)
3. Find the same heading using HTML structure (look for <h1> tags)
4. Find a link using text content ("More information")
5. Find the same link using CSS selectors or attributes
Describe which approach works better for each type of element.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=20,
use_vision=True # Enable vision for visual element detection
)
result = await agent.run()
print(f"Element Selection Analysis: {result}")
if __name__ == "__main__":
asyncio.run(element_selection_demo())# selection_strategies.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def selection_strategies():
"""Demonstrate different element selection strategies"""
strategies = [
# Text-based selection
"""
Go to https://news.ycombinator.com/
Find the "new" link using the text "new"
Click it and describe what page you're on
""",
# Position-based selection
"""
Go to https://news.ycombinator.com/
Find the submit button (usually at the top-right)
Click it and describe the submission form
""",
# Icon-based selection
"""
Go to https://github.com/
Find the search icon/magnifying glass
Click it and describe the search functionality
""",
# Color/pattern-based selection
"""
Go to https://news.ycombinator.com/
Find orange-colored score numbers
Extract the highest scoring story title and score
"""
]
for i, strategy in enumerate(strategies, 1):
print(f"\n--- Strategy {i} ---")
agent = Agent(
task=strategy,
llm=ChatOpenAI(model="gpt-4o"),
max_steps=15,
use_vision=True
)
result = await agent.run()
print(f"Result: {result}")
if __name__ == "__main__":
asyncio.run(selection_strategies())# complex_elements.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def complex_element_location():
"""Handle complex element finding scenarios"""
agent = Agent(
task="""
Navigate to https://httpbin.org/forms/post and handle complex element selection:
1. Find the delivery form (it's the main form on the page)
2. Locate all input fields in the form
3. Identify which fields are required vs optional
4. Find the submit button (it might be styled as a button or input)
5. Fill out a few fields to test form interaction
6. Don't submit - just verify you can interact with all elements
Describe the form structure and your element selection strategy.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=25,
use_vision=True
)
result = await agent.run()
print(f"Complex Element Analysis: {result}")
if __name__ == "__main__":
asyncio.run(complex_element_location())# dynamic_elements.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def dynamic_element_handling():
"""Handle dynamically loaded and changing elements"""
agent = Agent(
task="""
Handle dynamic content on https://httpbin.org/:
1. Go to the main page
2. Wait for all links to load (some may load asynchronously)
3. Find and count all HTTP method links (GET, POST, etc.)
4. Click on each one briefly, then go back
5. Keep track of which ones work and which might have issues
This tests handling of dynamic page loading and element stability.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=30,
use_vision=True
)
result = await agent.run()
print(f"Dynamic Element Handling: {result}")
if __name__ == "__main__":
asyncio.run(dynamic_element_handling())# shadow_dom_iframes.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def shadow_dom_handling():
"""Handle Shadow DOM and iframe elements"""
agent = Agent(
task="""
Test Shadow DOM and iframe handling:
1. Go to a page with embedded content or complex widgets
2. Look for iframe elements (embedded videos, forms, etc.)
3. Try to interact with content inside iframes
4. Look for Shadow DOM elements (custom web components)
5. Attempt to interact with Shadow DOM content
Report on what you can and cannot access in these scenarios.
""",
llm=ChatOpenai(model="gpt-4o"),
max_steps=25,
use_vision=True
)
result = await agent.run()
print(f"Shadow DOM/iFrame Analysis: {result}")
if __name__ == "__main__":
asyncio.run(shadow_dom_handling())# element_chains.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def element_interaction_chains():
"""Handle complex element interaction sequences"""
agent = Agent(
task="""
Perform a complex interaction sequence on GitHub:
1. Go to https://github.com/
2. Find the search input field
3. Type "browser-use" and press Enter
4. Wait for search results to load
5. Find the browser-use repository link
6. Click on it
7. On the repo page, find the "Issues" tab
8. Click on Issues
9. Find the "New issue" button
10. Click it (but don't actually create an issue)
Describe each step and any challenges you encountered.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=40,
use_vision=True
)
result = await agent.run()
print(f"Element Chain Results: {result}")
if __name__ == "__main__":
asyncio.run(element_interaction_chains())# conditional_selection.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def conditional_element_selection():
"""Handle conditional element finding and interaction"""
agent = Agent(
task="""
Handle conditional scenarios on a dynamic website:
1. Go to https://news.ycombinator.com/
2. Check if there are any stories with scores over 100
3. If yes, click on the highest-scoring story
4. If not, click on the most recent story (newest)
5. Read the article or discussion
6. Summarize what you found
This demonstrates conditional logic in element selection.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=30,
use_vision=True
)
result = await agent.run()
print(f"Conditional Selection Results: {result}")
if __name__ == "__main__":
asyncio.run(conditional_element_selection())# fallback_strategies.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def fallback_element_strategies():
"""Demonstrate robust element selection with fallbacks"""
agent = Agent(
task="""
Use multiple strategies to find and interact with elements:
1. Go to a website with multiple ways to access the same feature
2. Try to find a login button using different approaches:
- Look for text "Login" or "Sign In"
- Look for button styling (common login button appearance)
- Look for URL patterns (/login, /signin)
- Look for form elements that might be login forms
3. Try each approach and note which ones work
4. Successfully access the login functionality
Report on the effectiveness of different element selection strategies.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=25,
use_vision=True
)
result = await agent.run()
print(f"Fallback Strategy Results: {result}")
if __name__ == "__main__":
asyncio.run(fallback_element_strategies())# error_recovery.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def element_error_recovery():
"""Handle element selection failures gracefully"""
agent = Agent(
task="""
Test error recovery for element selection:
1. Go to https://example.com
2. Try to find an element that doesn't exist (like a "Download" button)
3. Handle the failure gracefully
4. Try alternative approaches to achieve the same goal
5. Look for similar elements or functionality
6. Adapt your approach based on what's actually available
Demonstrate how to recover from element selection failures.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=20,
retry_on_error=True,
max_retries=2
)
result = await agent.run()
print(f"Error Recovery Results: {result}")
if __name__ == "__main__":
asyncio.run(element_error_recovery())# fast_selection.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
from browser_use.browser.browser import BrowserConfig
async def fast_element_selection():
"""Optimize element selection for speed"""
# Optimized browser configuration
browser_config = BrowserConfig(
headless=True, # Faster without GUI
disable_images=True, # Don't load images
extra_chromium_args=[
"--disable-extensions",
"--disable-plugins",
"--disable-web-security", # For testing only
]
)
# Fast LLM
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.1,
max_tokens=500
)
agent = Agent(
task="""
Perform quick element extraction:
1. Go to https://httpbin.org/html
2. Quickly find the main heading (H1)
3. Extract all paragraph text
4. Find all links and their destinations
5. Return the extracted data efficiently
Focus on speed and accuracy.
""",
llm=llm,
browser=browser_config,
max_steps=15,
use_vision=False, # Rely on DOM for speed
)
import time
start_time = time.time()
result = await agent.run()
end_time = time.time()
print(f"Fast extraction completed in {end_time - start_time:.2f} seconds")
print(f"Results: {result}")
if __name__ == "__main__":
asyncio.run(fast_element_selection())# batch_operations.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def batch_element_operations():
"""Handle multiple element operations efficiently"""
agent = Agent(
task="""
Perform batch element operations on a data-rich page:
1. Go to https://news.ycombinator.com/
2. Extract information from multiple stories at once:
- Titles of top 10 stories
- Scores for each story
- Number of comments
- Usernames of submitters
3. Organize the data into a structured format
4. Present the information efficiently
Demonstrate handling multiple similar elements.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=25,
use_vision=True
)
result = await agent.run()
print(f"Batch Operation Results: {result}")
# Process structured data
if hasattr(result, 'extracted_content') and result.extracted_content:
print("\nStructured Data:")
for item in result.extracted_content[:5]: # Show first 5
print(f"- {item.get('title', 'N/A')} (Score: {item.get('score', 'N/A')})")
if __name__ == "__main__":
asyncio.run(batch_element_operations())# css_selectors.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def css_selector_integration():
"""Use CSS selectors for precise element targeting"""
agent = Agent(
task="""
Use CSS selectors for precise element selection:
1. Go to https://example.com
2. Find elements using CSS selectors:
- h1 (main heading)
- p (paragraphs)
- a[href] (links with href)
- .class-name (elements with specific class)
- #id-name (element with specific ID)
3. Extract content from elements found with different selectors
4. Compare the effectiveness of different selector strategies
This demonstrates integrating CSS knowledge with AI element selection.
""",
llm=ChatOpenAI(model="gpt-4o"),
max_steps=20
)
result = await agent.run()
print(f"CSS Selector Results: {result}")
if __name__ == "__main__":
asyncio.run(css_selector_integration())# javascript_integration.py
import asyncio
from browser_use import Agent, Controller
from langchain_openai import ChatOpenAI
async def javascript_element_manipulation():
"""Use JavaScript for advanced element manipulation"""
# Custom controller with JavaScript capabilities
controller = Controller()
agent = Agent(
task="""
Use JavaScript for advanced element operations:
1. Go to https://httpbin.org/html
2. Use JavaScript to:
- Count all elements on the page
- Find elements by custom criteria
- Modify element styles or content
- Execute complex DOM queries
3. Extract information that would be difficult with standard methods
4. Demonstrate JavaScript integration capabilities
Note: This requires custom controller implementation for full JS integration.
""",
llm=ChatOpenAI(model="gpt-4o"),
controller=controller,
max_steps=20
)
result = await agent.run()
print(f"JavaScript Integration Results: {result}")
if __name__ == "__main__":
asyncio.run(javascript_element_manipulation())In this chapter, we've covered:
- Element Selection Strategies: Vision vs DOM analysis approaches
- Advanced Location Techniques: Complex element finding scenarios
- Dynamic Content Handling: Dealing with changing and loaded content
- Shadow DOM/iFrames: Handling modern web architectures
- Interaction Patterns: Multi-step and conditional element chains
- Robust Selection: Fallback strategies and error recovery
- Performance Optimization: Fast and batch element operations
- Advanced Techniques: CSS selectors and JavaScript integration
- Multi-Modal Selection: Combines vision, DOM, and text analysis for robust element finding
- Fallback Strategies: Multiple approaches increase success rates
- Dynamic Handling: Adapts to changing page content and loading patterns
- Performance Matters: Optimized selection improves automation speed
- Error Recovery: Graceful handling of selection failures
- Advanced Integration: CSS and JavaScript for precise control
Now that you can reliably select and interact with elements, let's explore form automation techniques for structured data input.
Ready for Chapter 4? Form Automation
Generated for Awesome Code Docs
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for element, result, asyncio so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 3: Element Selection - Finding and Interacting with Web Elements as an operating subsystem inside Browser Use Tutorial: AI-Powered Web Automation Agents, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around Agent, agent, ChatOpenAI as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 3: Element Selection - Finding and Interacting with Web Elements usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
element. - Input normalization: shape incoming data so
resultreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
asyncio. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- Browser Use Repository
Why it matters: authoritative reference on
Browser Use Repository(github.com). - Browser Use Releases
Why it matters: authoritative reference on
Browser Use Releases(github.com). - Browser Use Docs
Why it matters: authoritative reference on
Browser Use Docs(docs.browser-use.com). - Browser Use Cloud
Why it matters: authoritative reference on
Browser Use Cloud(cloud.browser-use.com).
Suggested trace strategy:
- search upstream code for
elementandresultto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production