Skip to content

Latest commit

Β 

History

History
415 lines (311 loc) Β· 12.4 KB

File metadata and controls

415 lines (311 loc) Β· 12.4 KB

Usage Guide

Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026

A comprehensive guide to using the SQL Query Generator system.


Table of Contents

  1. Quick Start
  2. Using the Streamlit UI
  3. Programmatic Usage
  4. Query Examples
  5. Agent Modes
  6. Self-Improving System
  7. Troubleshooting

Quick Start

Prerequisites

  • Python 3.10+
  • IBM WatsonX account with API credentials
  • SQLite (included with Python)

Installation

# Clone and setup
cd cs-sqltest-agentskills
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure credentials
cp .env.example .env
# Edit .env with your WatsonX credentials

# Initialize database
python scripts/init_database.py

# Run the app
streamlit run app.py

Environment Configuration

Create a .env file:

# Required
WATSONX_API_KEY=your_api_key_here
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_PROJECT_ID=your_project_id_here

# Optional
DATABASE_PATH=./data/database.db
LEARNING_DB_PATH=./data/learning.db
WATSONX_MODEL_ID=ibm/granite-4-h-small

Using the Streamlit UI

Interface Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SQL Query Generator                                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Sidebar:                           Main Area:                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ πŸ€– LLM Model    β”‚               β”‚ Chat History                 β”‚β”‚
β”‚  β”‚ [Granite 4 β–Ό]   β”‚               β”‚                              β”‚β”‚
β”‚  β”‚                 β”‚               β”‚ πŸ‘€ show laptop sales in USA  β”‚β”‚
β”‚  β”‚ πŸ”§ Agent Mode   β”‚               β”‚                              β”‚β”‚
β”‚  β”‚ [BeeAI Agent β–Ό] β”‚               β”‚ πŸ€– Found 5 results:          β”‚β”‚
β”‚  β”‚                 β”‚               β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚β”‚
β”‚  β”‚ πŸ” Debug Info   β”‚               β”‚ β”‚ product_name | country β”‚   β”‚β”‚
β”‚  β”‚ [Expanded]      β”‚               β”‚ β”‚ Laptop Pro   | USA     β”‚   β”‚β”‚
β”‚  β”‚                 β”‚               β”‚ β”‚ ...          | ...     β”‚   β”‚β”‚
β”‚  β”‚ Processing:     β”‚               β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚β”‚
β”‚  β”‚ β€’ Query: 50ms   β”‚               β”‚                              β”‚β”‚
β”‚  β”‚ β€’ LLM: 800ms    β”‚               β”‚ πŸ‘ πŸ‘Ž  Rate this result      β”‚β”‚
β”‚  β”‚ β€’ SQL: 5ms      β”‚               β”‚                              β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Enter your query...                                    [Send] β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step-by-Step Usage

  1. Select a Model (sidebar)

    • Granite 4 Small - Recommended for most queries
    • Llama 3.3 70B - Best for complex reasoning
    • Mistral Large - Good multilingual support
  2. Select Agent Mode (sidebar)

    • Direct SQL - Fastest, simple queries
    • LangChain Agent - Best for complex queries
    • BeeAI Agent - Most reliable
    • Self-Improving - Learns from feedback
  3. Enter Your Query

    • Type natural language in the chat input
    • Press Enter or click Send
  4. Review Results

    • View generated SQL in debug panel
    • See query results in table format
    • Check processing times
  5. Provide Feedback

    • Click πŸ‘ if results are correct
    • Click πŸ‘Ž if results are wrong
    • Feedback improves future queries

Follow-up Queries

The system understands conversational context:

You: show laptop sales
Bot: [Shows laptop sales data]

You: same for USA
Bot: [Shows laptop sales filtered to USA]

You: top 5 by revenue
Bot: [Shows top 5 USA laptop sales by revenue]

Programmatic Usage

Basic SQL Generation

from watsonx_client import WatsonXClient
from schema_loader import get_schema_string

# Initialize client
client = WatsonXClient(model_id="ibm/granite-4-h-small")

# Generate SQL
sql = client.generate_sql_query(
    user_query="show all products under $100",
    schema_info=get_schema_string()
)

print(f"Generated SQL: {sql}")
# Output: SELECT * FROM products WHERE price < 100

Using Agents with Execution

from beeai_agent import BeeAISQLAgent

# Initialize agent
agent = BeeAISQLAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db"
)

# Run query with execution
result = agent.run_query("show laptop sales by country")

if result['success']:
    print(f"SQL: {result['sql']}")
    print(f"Rows: {result['row_count']}")
    for row in result['result']:
        print(row)
else:
    print(f"Error: {result['error']}")

Using the Learning Store

from learning_store import get_learning_store

store = get_learning_store()

# Find similar patterns
patterns = store.find_similar_patterns("show laptop revenue", limit=3)
for p in patterns:
    print(f"Similar: {p['user_query']} (similarity: {p['similarity']:.2f})")

# Store a successful pattern
pattern_id = store.store_success_pattern(
    user_query="show laptop sales",
    generated_sql="SELECT * FROM sales WHERE product_name LIKE '%laptop%'",
    execution_time_ms=50
)

# Record feedback
store.record_feedback(pattern_id, is_positive=True)

Query Classification

from query_classifier import classify_query

# Database query
cls, msg = classify_query("show all products")
print(f"Classification: {cls}")  # 'database'

# General question
cls, msg = classify_query("what time is it")
print(f"Classification: {cls}")  # 'general'

# Unknown entity
cls, msg = classify_query("show BMW sales")
print(f"Classification: {cls}")  # 'unknown_entity'
print(f"Message: {msg}")  # "I couldn't find 'BMW' in the database..."

Context-Aware Queries

from context_manager import get_conversation_context

ctx = get_conversation_context()

# First query
ctx.add_query(
    query="show laptop sales",
    query_type="database",
    sql="SELECT * FROM sales WHERE product_name LIKE '%laptop%'",
    result_count=10
)

# Follow-up query
expanded = ctx.resolve_follow_up("same for USA")
print(f"Expanded: {expanded}")  # "show laptop sales in USA"

Query Examples

Product Queries

Natural Language Generated SQL
"show all products" SELECT * FROM products
"list laptop products" SELECT * FROM products WHERE product_name LIKE '%laptop%'
"products under $100" SELECT * FROM products WHERE price < 100
"how many products" SELECT COUNT(*) FROM products

Sales Queries

Natural Language Generated SQL
"total revenue" SELECT SUM(total_amount) FROM sales
"sales by country" SELECT country, SUM(total_amount) FROM sales GROUP BY country
"laptop sales in USA" SELECT * FROM sales WHERE product_name LIKE '%laptop%' AND country = 'USA'
"top 5 products by sales" SELECT product_name, SUM(total_amount) FROM sales GROUP BY product_name ORDER BY SUM(total_amount) DESC LIMIT 5

Customer Queries

Natural Language Generated SQL
"list all customers" SELECT * FROM customers
"customers from USA" SELECT * FROM customers WHERE country = 'USA'
"how many customers in Europe" SELECT COUNT(*) FROM customers WHERE region = 'Europe'

Complex Queries

Natural Language Generated SQL
"revenue per country for laptops" SELECT country, SUM(total_amount) FROM sales WHERE product_name LIKE '%laptop%' GROUP BY country
"customers who bought smartphones" SELECT DISTINCT c.* FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN products p ON o.product_id = p.product_id WHERE p.product_name LIKE '%smartphone%'

Agent Modes

Mode Comparison

Mode Speed Accuracy Learning Best For
Direct SQL ⚑⚑⚑ ⭐⭐ βœ… Simple queries, speed-critical
LangChain ⚑⚑ ⭐⭐⭐ βœ… Complex queries, auto-retry
BeeAI ⚑⚑ ⭐⭐⭐ βœ… Reliability, native WatsonX
Self-Improving ⚑ ⭐⭐⭐⭐ βœ…βœ…βœ… Continuous improvement

When to Use Each Mode

Direct SQL (watsonx_client.py)

  • Dashboard queries with known patterns
  • High-volume, simple queries
  • When you need fastest response

LangChain Agent (langchain_agent.py)

  • Complex joins and aggregations
  • Queries that might need retry
  • When schema changes frequently

BeeAI Agent (beeai_agent.py)

  • Production deployments
  • When using other BeeAI features
  • Best balance of speed/accuracy

Self-Improving (self_improving_agent.py)

  • Learning from user patterns
  • Building a knowledge base
  • When feedback is available

Self-Improving System

How It Works

  1. Query Embedding: Your query is converted to a 384-dimensional vector
  2. Pattern Search: Find similar successful queries in the learning store
  3. Few-Shot Prompting: Include similar examples in the LLM prompt
  4. Error Avoidance: Include common mistakes to avoid
  5. Result Storage: Store successful patterns for future use
  6. Feedback Loop: User πŸ‘/πŸ‘Ž improves pattern quality

Viewing Learning Statistics

from learning_store import get_learning_store

store = get_learning_store()
stats = store.get_stats()

print(f"Successful patterns: {stats['total_patterns']}")
print(f"Error patterns: {stats['total_errors']}")
print(f"Average rating: {stats['avg_rating']:.2f}")

Resetting the Learning Store

import os
os.remove("./data/learning.db")
# The store will reinitialize on next use

Troubleshooting

Common Issues

"Missing WatsonX credentials"

Solution: Check your .env file contains:
- WATSONX_API_KEY
- WATSONX_URL
- WATSONX_PROJECT_ID

"Database not found"

# Reinitialize the database
python scripts/init_database.py

"SQL Error: no such column"

This usually means the LLM used a wrong column name.
Solution: Try rephrasing your query or use a more capable model (Llama 70B)

"Query returns no results"

The system will analyze why:
- Check if the entity exists: "show BMW sales" β†’ "BMW not in database"
- Check filters: The data might not have matches for your criteria

Slow Response Times

Solutions:
1. Use a faster model (Granite 4 Small vs Llama 70B)
2. Use Direct SQL mode for simple queries
3. Check if embedding cache is working (see debug panel)

Debug Information

Enable detailed logging:

from logging_config import setup_logging
setup_logging(level="DEBUG")

View processing steps in UI:

  1. Expand "πŸ” Debug Info" in sidebar
  2. See timing breakdown for each step
  3. View the actual LLM prompt sent
  4. See similar patterns found

Getting Help