Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026
A comprehensive guide to using the SQL Query Generator system.
- Quick Start
- Using the Streamlit UI
- Programmatic Usage
- Query Examples
- Agent Modes
- Self-Improving System
- Troubleshooting
- Python 3.10+
- IBM WatsonX account with API credentials
- SQLite (included with Python)
# Clone and setup
cd cs-sqltest-agentskills
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure credentials
cp .env.example .env
# Edit .env with your WatsonX credentials
# Initialize database
python scripts/init_database.py
# Run the app
streamlit run app.pyCreate a .env file:
# Required
WATSONX_API_KEY=your_api_key_here
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_PROJECT_ID=your_project_id_here
# Optional
DATABASE_PATH=./data/database.db
LEARNING_DB_PATH=./data/learning.db
WATSONX_MODEL_ID=ibm/granite-4-h-smallββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SQL Query Generator β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Sidebar: Main Area: β
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β β π€ LLM Model β β Chat History ββ
β β [Granite 4 βΌ] β β ββ
β β β β π€ show laptop sales in USA ββ
β β π§ Agent Mode β β ββ
β β [BeeAI Agent βΌ] β β π€ Found 5 results: ββ
β β β β ββββββββββββββββββββββββββ ββ
β β π Debug Info β β β product_name | country β ββ
β β [Expanded] β β β Laptop Pro | USA β ββ
β β β β β ... | ... β ββ
β β Processing: β β ββββββββββββββββββββββββββ ββ
β β β’ Query: 50ms β β ββ
β β β’ LLM: 800ms β β π π Rate this result ββ
β β β’ SQL: 5ms β β ββ
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Enter your query... [Send] β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-
Select a Model (sidebar)
- Granite 4 Small - Recommended for most queries
- Llama 3.3 70B - Best for complex reasoning
- Mistral Large - Good multilingual support
-
Select Agent Mode (sidebar)
- Direct SQL - Fastest, simple queries
- LangChain Agent - Best for complex queries
- BeeAI Agent - Most reliable
- Self-Improving - Learns from feedback
-
Enter Your Query
- Type natural language in the chat input
- Press Enter or click Send
-
Review Results
- View generated SQL in debug panel
- See query results in table format
- Check processing times
-
Provide Feedback
- Click π if results are correct
- Click π if results are wrong
- Feedback improves future queries
The system understands conversational context:
You: show laptop sales
Bot: [Shows laptop sales data]
You: same for USA
Bot: [Shows laptop sales filtered to USA]
You: top 5 by revenue
Bot: [Shows top 5 USA laptop sales by revenue]
from watsonx_client import WatsonXClient
from schema_loader import get_schema_string
# Initialize client
client = WatsonXClient(model_id="ibm/granite-4-h-small")
# Generate SQL
sql = client.generate_sql_query(
user_query="show all products under $100",
schema_info=get_schema_string()
)
print(f"Generated SQL: {sql}")
# Output: SELECT * FROM products WHERE price < 100from beeai_agent import BeeAISQLAgent
# Initialize agent
agent = BeeAISQLAgent(
model_id="ibm/granite-4-h-small",
db_path="./data/database.db"
)
# Run query with execution
result = agent.run_query("show laptop sales by country")
if result['success']:
print(f"SQL: {result['sql']}")
print(f"Rows: {result['row_count']}")
for row in result['result']:
print(row)
else:
print(f"Error: {result['error']}")from learning_store import get_learning_store
store = get_learning_store()
# Find similar patterns
patterns = store.find_similar_patterns("show laptop revenue", limit=3)
for p in patterns:
print(f"Similar: {p['user_query']} (similarity: {p['similarity']:.2f})")
# Store a successful pattern
pattern_id = store.store_success_pattern(
user_query="show laptop sales",
generated_sql="SELECT * FROM sales WHERE product_name LIKE '%laptop%'",
execution_time_ms=50
)
# Record feedback
store.record_feedback(pattern_id, is_positive=True)from query_classifier import classify_query
# Database query
cls, msg = classify_query("show all products")
print(f"Classification: {cls}") # 'database'
# General question
cls, msg = classify_query("what time is it")
print(f"Classification: {cls}") # 'general'
# Unknown entity
cls, msg = classify_query("show BMW sales")
print(f"Classification: {cls}") # 'unknown_entity'
print(f"Message: {msg}") # "I couldn't find 'BMW' in the database..."from context_manager import get_conversation_context
ctx = get_conversation_context()
# First query
ctx.add_query(
query="show laptop sales",
query_type="database",
sql="SELECT * FROM sales WHERE product_name LIKE '%laptop%'",
result_count=10
)
# Follow-up query
expanded = ctx.resolve_follow_up("same for USA")
print(f"Expanded: {expanded}") # "show laptop sales in USA"| Natural Language | Generated SQL |
|---|---|
| "show all products" | SELECT * FROM products |
| "list laptop products" | SELECT * FROM products WHERE product_name LIKE '%laptop%' |
| "products under $100" | SELECT * FROM products WHERE price < 100 |
| "how many products" | SELECT COUNT(*) FROM products |
| Natural Language | Generated SQL |
|---|---|
| "total revenue" | SELECT SUM(total_amount) FROM sales |
| "sales by country" | SELECT country, SUM(total_amount) FROM sales GROUP BY country |
| "laptop sales in USA" | SELECT * FROM sales WHERE product_name LIKE '%laptop%' AND country = 'USA' |
| "top 5 products by sales" | SELECT product_name, SUM(total_amount) FROM sales GROUP BY product_name ORDER BY SUM(total_amount) DESC LIMIT 5 |
| Natural Language | Generated SQL |
|---|---|
| "list all customers" | SELECT * FROM customers |
| "customers from USA" | SELECT * FROM customers WHERE country = 'USA' |
| "how many customers in Europe" | SELECT COUNT(*) FROM customers WHERE region = 'Europe' |
| Natural Language | Generated SQL |
|---|---|
| "revenue per country for laptops" | SELECT country, SUM(total_amount) FROM sales WHERE product_name LIKE '%laptop%' GROUP BY country |
| "customers who bought smartphones" | SELECT DISTINCT c.* FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN products p ON o.product_id = p.product_id WHERE p.product_name LIKE '%smartphone%' |
| Mode | Speed | Accuracy | Learning | Best For |
|---|---|---|---|---|
| Direct SQL | β‘β‘β‘ | ββ | β | Simple queries, speed-critical |
| LangChain | β‘β‘ | βββ | β | Complex queries, auto-retry |
| BeeAI | β‘β‘ | βββ | β | Reliability, native WatsonX |
| Self-Improving | β‘ | ββββ | β β β | Continuous improvement |
Direct SQL (watsonx_client.py)
- Dashboard queries with known patterns
- High-volume, simple queries
- When you need fastest response
LangChain Agent (langchain_agent.py)
- Complex joins and aggregations
- Queries that might need retry
- When schema changes frequently
BeeAI Agent (beeai_agent.py)
- Production deployments
- When using other BeeAI features
- Best balance of speed/accuracy
Self-Improving (self_improving_agent.py)
- Learning from user patterns
- Building a knowledge base
- When feedback is available
- Query Embedding: Your query is converted to a 384-dimensional vector
- Pattern Search: Find similar successful queries in the learning store
- Few-Shot Prompting: Include similar examples in the LLM prompt
- Error Avoidance: Include common mistakes to avoid
- Result Storage: Store successful patterns for future use
- Feedback Loop: User π/π improves pattern quality
from learning_store import get_learning_store
store = get_learning_store()
stats = store.get_stats()
print(f"Successful patterns: {stats['total_patterns']}")
print(f"Error patterns: {stats['total_errors']}")
print(f"Average rating: {stats['avg_rating']:.2f}")import os
os.remove("./data/learning.db")
# The store will reinitialize on next useSolution: Check your .env file contains:
- WATSONX_API_KEY
- WATSONX_URL
- WATSONX_PROJECT_ID
# Reinitialize the database
python scripts/init_database.pyThis usually means the LLM used a wrong column name.
Solution: Try rephrasing your query or use a more capable model (Llama 70B)
The system will analyze why:
- Check if the entity exists: "show BMW sales" β "BMW not in database"
- Check filters: The data might not have matches for your criteria
Solutions:
1. Use a faster model (Granite 4 Small vs Llama 70B)
2. Use Direct SQL mode for simple queries
3. Check if embedding cache is working (see debug panel)
Enable detailed logging:
from logging_config import setup_logging
setup_logging(level="DEBUG")View processing steps in UI:
- Expand "π Debug Info" in sidebar
- See timing breakdown for each step
- View the actual LLM prompt sent
- See similar patterns found
- Check ARCHITECTURE.md for system design
- Check API.md for module documentation
- Check COMPARISON_RESULTS.md for test results