| layout | default |
|---|---|
| title | OpenHands Tutorial - Chapter 6: Refactoring |
| nav_order | 6 |
| has_children | false |
| parent | OpenHands Tutorial |
Welcome to Chapter 6: Refactoring - Code Structure Improvement and Modernization. In this part of OpenHands Tutorial: Autonomous Software Engineering Workflows, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Master OpenHands' refactoring capabilities for improving code structure, performance, and maintainability through systematic code transformations.
OpenHands provides sophisticated refactoring capabilities, from simple code improvements to complex architectural transformations. This chapter covers automated code refactoring, modernization, and optimization techniques.
from openhands import OpenHands
# Refactoring agent
refactor_agent = OpenHands()
# Complex function refactoring
function_refactor = refactor_agent.run("""
Refactor this complex function into smaller, focused functions:
```python
def process_user_orders(user_id, orders_data, payment_info, shipping_address):
# Validate user exists
user = get_user_by_id(user_id)
if not user:
raise ValueError(f"User {user_id} not found")
# Process each order
processed_orders = []
total_amount = 0
for order_data in orders_data:
# Validate order data
if not validate_order_data(order_data):
continue
# Calculate order total
order_total = calculate_order_total(order_data['items'])
total_amount += order_total
# Create order record
order = create_order_record(
user_id=user_id,
items=order_data['items'],
total=order_total,
shipping_address=shipping_address
)
processed_orders.append(order)
# Process payment
if total_amount > 0:
payment_result = process_payment(
payment_info=payment_info,
amount=total_amount,
orders=processed_orders
)
if not payment_result['success']:
# Rollback orders
rollback_orders(processed_orders)
raise PaymentError("Payment processing failed")
return {
'processed_orders': processed_orders,
'total_amount': total_amount,
'payment_status': payment_result
}Refactor into:
- Input validation functions
- Order processing pipeline
- Payment handling abstraction
- Error handling and rollback mechanisms
- Result aggregation and reporting
Include proper error handling, logging, and type hints. """)
class_refactor = refactor_agent.run(""" Refactor this large class into smaller, focused classes:
class UserManager:
def __init__(self, db_connection):
self.db = db_connection
self.cache = {}
self.logger = logging.getLogger(__name__)
def create_user(self, user_data):
# Validate input
self._validate_user_data(user_data)
# Check if user exists
if self._user_exists(user_data['email']):
raise ValueError("User already exists")
# Hash password
user_data['password_hash'] = self._hash_password(user_data['password'])
# Save to database
user_id = self.db.insert('users', user_data)
# Send welcome email
self._send_welcome_email(user_data['email'])
# Cache user
self.cache[user_id] = user_data
return user_id
def authenticate_user(self, email, password):
# Get user from cache or DB
user = self.cache.get(email) or self.db.query('users', {'email': email})
if not user:
return None
# Verify password
if self._verify_password(password, user['password_hash']):
return user
return None
def update_user_profile(self, user_id, updates):
# Validate updates
self._validate_updates(updates)
# Update in database
self.db.update('users', user_id, updates)
# Update cache
if user_id in self.cache:
self.cache[user_id].update(updates)
# Log update
self.logger.info(f"Updated user {user_id}")
def delete_user(self, user_id):
# Remove from cache
self.cache.pop(user_id, None)
# Remove from database
self.db.delete('users', user_id)
# Log deletion
self.logger.info(f"Deleted user {user_id}")
# Private helper methods
def _validate_user_data(self, data):
# Validation logic...
pass
def _user_exists(self, email):
# Check existence...
pass
def _hash_password(self, password):
# Hashing logic...
pass
def _verify_password(self, password, hash):
# Verification logic...
pass
def _validate_updates(self, updates):
# Validation logic...
pass
def _send_welcome_email(self, email):
# Email logic...
passSplit into focused classes:
- UserValidator - Input validation
- UserRepository - Data access
- AuthenticationService - Auth logic
- EmailService - Notifications
- CacheManager - Caching logic
- UserManager - Orchestration """)
## Architectural Refactoring
### Monolithic to Microservices
```python
# Architecture refactoring agent
architecture_agent = OpenHands()
# Monolithic to microservices refactoring
microservices_refactor = architecture_agent.run("""
Refactor this monolithic e-commerce application into microservices:
Current Monolithic Structure:
```python
class EcommerceApp:
def __init__(self):
self.users = UserManager()
self.products = ProductManager()
self.orders = OrderManager()
self.payments = PaymentManager()
self.inventory = InventoryManager()
self.notifications = NotificationManager()
def place_order(self, user_id, items):
# Validate user
user = self.users.get(user_id)
# Check inventory
for item in items:
if not self.inventory.check_stock(item['id'], item['quantity']):
raise OutOfStockError(f"Item {item['id']} out of stock")
# Create order
order = self.orders.create(user_id, items)
# Process payment
payment = self.payments.process(order.total)
if payment.success:
# Update inventory
self.inventory.update_stock(items)
# Send confirmation
self.notifications.send_order_confirmation(user.email, order)
return order
else:
# Cancel order
self.orders.cancel(order.id)
raise PaymentError("Payment failed")
Refactor into microservices:
- User Service - User management and authentication
- Product Service - Product catalog and information
- Order Service - Order creation and management
- Payment Service - Payment processing
- Inventory Service - Stock management
- Notification Service - Email and messaging
Include:
- Service boundaries and APIs
- Inter-service communication (REST/gRPC)
- Event-driven architecture
- Saga pattern for distributed transactions
- API Gateway design
- Service discovery and registration """)
database_refactor = architecture_agent.run(""" Refactor database design from monolithic to microservices:
Current Monolithic Schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE,
name VARCHAR(255),
created_at TIMESTAMP
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10,2),
category_id INT,
stock_quantity INT
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
total DECIMAL(10,2),
status VARCHAR(50),
created_at TIMESTAMP
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT,
price DECIMAL(10,2)
);Refactor for microservices:
- User Service Database - User data only
- Product Service Database - Product catalog
- Order Service Database - Orders and items
- Payment Service Database - Payment records
- Inventory Service Database - Stock levels
Include:
- Database per service pattern
- Event sourcing for data consistency
- CQRS pattern implementation
- Data migration strategies
- API composition for complex queries """)
## Performance Optimization Refactoring
### Algorithm and Data Structure Optimization
```python
# Performance optimization agent
perf_optimizer = OpenHands()
# Algorithm optimization
algorithm_refactor = perf_optimizer.run("""
Optimize these inefficient algorithms:
**Inefficient Search Algorithm:**
```python
def find_user_by_email(users_list, email):
# O(n) linear search - inefficient for large lists
for user in users_list:
if user['email'] == email:
return user
return None
Memory-Inefficient Data Processing:
def process_large_file(file_path):
# Loads entire file into memory
with open(file_path, 'r') as f:
content = f.read() # Memory intensive
lines = content.split('\n')
processed_lines = []
for line in lines:
# Creates new list for each processed line
processed_lines.append(process_line(line)) # Memory inefficient
return processed_linesRecursive Function with Deep Stack:
def fibonacci_recursive(n):
if n <= 1:
return n
# Deep recursion can cause stack overflow
return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)Optimize to:
- Use appropriate data structures (hash tables, trees, etc.)
- Implement efficient algorithms (binary search, dynamic programming)
- Use streaming processing for large data
- Implement iterative solutions to avoid stack overflow
- Add caching and memoization
- Use concurrent/parallel processing where appropriate """)
memory_refactor = perf_optimizer.run(""" Refactor for memory and resource efficiency:
Memory Leak in Cache:
class SimpleCache:
def __init__(self):
self.cache = {}
def set(self, key, value):
self.cache[key] = value # No size limits or expiration
def get(self, key):
return self.cache.get(key)Inefficient String Concatenation:
def build_html_table(rows):
html = "<table>"
for row in rows:
html += "<tr>" # Creates new string each iteration
for cell in row:
html += f"<td>{cell}</td>" # Memory inefficient
html += "</tr>"
html += "</table>"
return htmlResource-Intensive File Processing:
def count_words_in_files(file_paths):
word_counts = {}
for file_path in file_paths:
with open(file_path, 'r') as f:
content = f.read() # Loads entire file
words = content.split()
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
return word_countsRefactor with:
- LRU cache with size limits and TTL
- StringBuilder pattern or join operations
- Streaming file processing with generators
- Memory-mapped files for large data
- Connection pooling for database access
- Lazy loading and pagination """)
## Code Modernization
### Language Feature Adoption
```python
# Code modernization agent
modernizer = OpenHands()
# Python modernization
python_modernize = modernizer.run("""
Modernize this legacy Python code to use contemporary features:
**Legacy Code to Modernize:**
```python
# Python 2 style
def process_data(data_list):
result = []
for item in data_list:
if item is not None:
# Old-style formatting
formatted = "Item: %s, Value: %d" % (item['name'], item['value'])
result.append(formatted)
return result
# Old-style class
class DataProcessor:
def __init__(self, data):
self.data = data
def filter_data(self, condition):
# List comprehension could be more efficient
filtered = []
for item in self.data:
if condition(item):
filtered.append(item)
return filtered
# Exception handling without context managers
def read_config(file_path):
file_handle = open(file_path, 'r')
try:
config = {}
for line in file_handle:
key, value = line.strip().split('=')
config[key] = value
return config
finally:
file_handle.close()
Modernize to use:
- f-strings instead of % formatting and .format()
- Type hints for better code documentation
- Dataclasses for simple data structures
- Context managers with
withstatements - Generator expressions for memory efficiency
- Modern exception handling with
raise from - Pathlib for file path operations
- Enum for constants
- functools.lru_cache for memoization
- asyncio for concurrent operations """)
js_modernize = modernizer.run(""" Modernize legacy JavaScript code to ES6+ features:
Legacy Code:
// ES5 style
var UserManager = function() {
this.users = [];
};
UserManager.prototype.addUser = function(user) {
// Callback-based async
fs.readFile('users.json', 'utf8', function(err, data) {
if (err) throw err;
var users = JSON.parse(data);
users.push(user);
fs.writeFile('users.json', JSON.stringify(users), function(err) {
if (err) throw err;
console.log('User added');
});
});
};
// Old-style promise handling
function fetchUser(userId) {
return new Promise(function(resolve, reject) {
http.get('/api/users/' + userId, function(res) {
var data = '';
res.on('data', function(chunk) {
data += chunk;
});
res.on('end', function() {
resolve(JSON.parse(data));
});
}).on('error', reject);
});
}Modernize to:
- ES6 classes instead of prototype functions
- Arrow functions for concise syntax
- Async/await instead of callbacks and promises
- Template literals for string interpolation
- Destructuring for object/array manipulation
- Modules with import/export
- Optional chaining and nullish coalescing
- Map/Set for collections
- Object spread/rest operators
- Promises with async/await patterns """)
## Security Refactoring
### Security Vulnerability Fixes
```python
# Security refactoring agent
security_refactor = OpenHands()
# SQL injection and security fixes
security_fixes = security_refactor.run("""
Fix security vulnerabilities in this code:
**SQL Injection Vulnerable:**
```python
def get_user(username):
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
# VULNERABLE: Direct string interpolation
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)
return cursor.fetchone()
Command Injection:
import subprocess
def run_command(user_input):
# VULNERABLE: Direct command execution
subprocess.call(f"ls -la {user_input}", shell=True)Insecure Password Storage:
def create_user(username, password):
# VULNERABLE: Plain text password storage
with open('users.txt', 'a') as f:
f.write(f"{username}:{password}\n")XSS Vulnerability:
app.get('/user/:id', function(req, res) {
// VULNERABLE: Direct HTML injection
var userId = req.params.id;
res.send('<h1>User: ' + userId + '</h1>');
});Insecure Random Generation:
import random
def generate_token():
# VULNERABLE: Predictable random
return str(random.randint(100000, 999999))Fix with:
- Parameterized queries for SQL
- Input validation and sanitization
- Password hashing with bcrypt/scrypt
- HTML escaping and Content Security Policy
- Cryptographically secure random generation
- Input validation and whitelisting
- Secure headers and CORS configuration
- Rate limiting and abuse prevention """)
auth_refactor = security_refactor.run(""" Refactor authentication and authorization:
Weak Authentication:
def authenticate(username, password):
users = load_users_from_file()
for user in users:
if user['username'] == username and user['password'] == password:
return user
return NoneMissing Authorization:
@app.route('/admin/users')
def admin_users():
# No authorization check
return get_all_users()Insecure Session Management:
# Client-side session storage
localStorage.setItem('session_token', token);
// Server-side session without expiration
sessions = {}
def create_session(user_id):
session_id = str(uuid.uuid4())
sessions[session_id] = {'user_id': user_id}
return session_idRefactor to:
- Secure password hashing with salt
- JWT tokens with proper expiration
- Role-based access control (RBAC)
- Session management with secure cookies
- Multi-factor authentication
- OAuth 2.0 integration
- API key authentication
- Audit logging for security events """)
## Testing Integration in Refactoring
### Refactoring with Test Coverage
```python
# Refactoring with testing agent
test_refactor = OpenHands()
# Refactoring with test coverage
refactor_with_tests = test_refactor.run("""
Refactor this code while maintaining and improving test coverage:
**Original Code:**
```python
class ShoppingCart:
def __init__(self):
self.items = []
self.total = 0
def add_item(self, item, price, quantity=1):
self.items.append({
'item': item,
'price': price,
'quantity': quantity
})
self.total += price * quantity
def remove_item(self, item_name):
for i, item in enumerate(self.items):
if item['item'] == item_name:
self.total -= item['price'] * item['quantity']
del self.items[i]
return True
return False
def get_total(self):
return self.total
Existing Tests:
def test_shopping_cart():
cart = ShoppingCart()
# Test adding items
cart.add_item("Apple", 1.50, 2)
assert cart.get_total() == 3.00
# Test removing items
cart.remove_item("Apple")
assert cart.get_total() == 0.00Refactor to improve:
- Type safety with dataclasses and type hints
- Error handling for invalid inputs
- Discount system for flexible pricing
- Inventory integration for stock checking
- Persistence layer for saving carts
- Thread safety for concurrent access
Ensure all existing tests pass and add comprehensive new tests for new features. """)
legacy_modernize = test_refactor.run(""" Modernize legacy code while preserving functionality through comprehensive testing:
Legacy Code:
# Python 2 style code
def process_orders(orders):
results = []
for order in orders:
if validate_order(order):
total = calculate_total(order['items'])
if total > 0:
results.append({
'order_id': order['id'],
'total': total,
'status': 'processed'
})
else:
results.append({
'order_id': order['id'],
'error': 'Invalid total'
})
else:
results.append({
'order_id': order['id'],
'error': 'Invalid order'
})
return resultsModernize to:
- Type hints and modern Python syntax
- Exception handling instead of error dictionaries
- Generator functions for memory efficiency
- Dataclasses for data structures
- Async/await for I/O operations
- Logging for debugging and monitoring
Create comprehensive test suite that validates both old and new behavior, then perform the refactoring safely. """)
## Automated Refactoring Tools
### Code Analysis and Suggestions
```python
# Automated refactoring tools
auto_refactor = OpenHands()
# Code analysis and automated refactoring
code_analysis = auto_refactor.run("""
Create automated code analysis and refactoring tools:
1. **Code Smell Detection**
- Long methods (>50 lines)
- Large classes (>300 lines)
- High cyclomatic complexity
- Duplicate code blocks
- Unused variables and imports
- Missing documentation
2. **Automated Refactoring Suggestions**
- Extract method refactoring
- Inline method refactoring
- Move method/field refactoring
- Rename refactoring
- Change signature refactoring
- Encapsulate field refactoring
3. **Safety Analysis**
- Impact analysis for refactoring
- Test coverage verification
- Dependency analysis
- Breaking change detection
4. **Batch Refactoring**
- Apply refactoring across multiple files
- Preview changes before applying
- Rollback capability
- Progress tracking and reporting
Include integration with popular IDEs and command-line tools.
""")
# IDE integration for refactoring
ide_integration = auto_refactor.run("""
Create IDE integration for refactoring tools:
1. **Visual Studio Code Extension**
- Refactoring commands in command palette
- Quick fixes for detected issues
- Refactoring preview with diff view
- Multi-file refactoring support
2. **PyCharm Plugin**
- Intention actions for refactoring
- Refactoring templates and presets
- Code inspection integration
- Test generation for refactored code
3. **Language Server Protocol**
- LSP server for refactoring capabilities
- Support for multiple editors
- Standardized refactoring interface
- Extensible refactoring providers
4. **Command-Line Tools**
- CLI interface for batch refactoring
- Configuration file support
- CI/CD pipeline integration
- Automated refactoring workflows
Include documentation, examples, and best practices for each integration.
""")
In this chapter, we've covered OpenHands' comprehensive refactoring capabilities:
- Code Structure Refactoring: Function/method splitting, class decomposition
- Architectural Refactoring: Monolithic to microservices, database design
- Performance Optimization: Algorithm improvement, memory/resource efficiency
- Code Modernization: Language feature adoption, legacy code updates
- Security Refactoring: Vulnerability fixes, authentication improvements
- Testing Integration: Refactoring with test coverage maintenance
- Automated Tools: Code analysis, IDE integration, batch refactoring
OpenHands can systematically improve codebases while maintaining functionality and adding comprehensive testing.
- Systematic Approach: Follow structured refactoring methodologies
- Safety First: Maintain test coverage and functionality during changes
- Modern Standards: Adopt contemporary language features and patterns
- Performance Focus: Optimize algorithms, memory usage, and resource consumption
- Security Priority: Address vulnerabilities and implement security best practices
- Tool Integration: Leverage automated tools and IDE integrations
Next, we'll explore integration - OpenHands' ability to connect applications with external APIs, databases, and services.
Ready for the next chapter? Chapter 7: Integration
Generated for Awesome Code Docs
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for refactoring, Refactor, Service so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 6: Refactoring - Code Structure Improvement and Modernization as an operating subsystem inside OpenHands Tutorial: Autonomous Software Engineering Workflows, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around code, Code, OpenHands as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 6: Refactoring - Code Structure Improvement and Modernization usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
refactoring. - Input normalization: shape incoming data so
Refactorreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
Service. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- OpenHands Repository
Why it matters: authoritative reference on
OpenHands Repository(github.com). - OpenHands Docs
Why it matters: authoritative reference on
OpenHands Docs(docs.openhands.dev). - OpenHands Releases
Why it matters: authoritative reference on
OpenHands Releases(github.com).
Suggested trace strategy:
- search upstream code for
refactoringandRefactorto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production