Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026
This document provides detailed API documentation for all modules in the SQL Query Generator.
Module: watsonx_client.py
Direct WatsonX LLM client for SQL generation with optional learning capabilities.
from watsonx_client import WatsonXClient
client = WatsonXClient(
model_id="ibm/granite-4-h-small", # Optional: defaults to env or granite-4
use_learning=True # Optional: enable pattern learning
)| Attribute | Type | Description |
|---|---|---|
AVAILABLE_MODELS |
dict |
Dictionary of supported WatsonX models |
current_model_id |
str |
Currently active model ID |
Generate SQL from natural language.
Parameters:
| Parameter | Type | Description |
|---|---|---|
user_query |
str |
Natural language query |
schema_info |
str |
Database schema description |
product_mapping |
dict |
Optional product name→code mapping |
Returns: str - Generated SQL query
Example:
from schema_loader import get_schema_string
from product_mapper import get_product_mapping_dict
sql = client.generate_sql_query(
user_query="show laptop sales in USA",
schema_info=get_schema_string(),
product_mapping=get_product_mapping_dict()
)
# Returns: "SELECT * FROM sales WHERE product_name LIKE '%laptop%' AND country = 'USA'"Switch to a different LLM model.
Parameters:
| Parameter | Type | Description |
|---|---|---|
model_id |
str |
Model ID from AVAILABLE_MODELS |
Returns: bool - True if switch successful
Record user feedback for a query pattern.
Parameters:
| Parameter | Type | Description |
|---|---|---|
pattern_id |
int |
ID from learning store |
is_positive |
bool |
True for thumbs up, False for thumbs down |
Get debug information about the last query.
Returns: list[str] - Processing steps with timing
Module: langchain_agent.py
LangChain-based SQL agent with automatic schema discovery and retry logic.
from langchain_agent import LangChainSQLAgent
agent = LangChainSQLAgent(
model_id="ibm/granite-4-h-small",
db_path="./data/database.db",
use_learning=True
)Execute a natural language query and return results.
Parameters:
| Parameter | Type | Description |
|---|---|---|
user_query |
str |
Natural language query |
schema_info |
str |
Optional schema (auto-loaded if None) |
product_mapping |
dict |
Optional product mapping |
Returns: dict with keys:
| Key | Type | Description |
|---|---|---|
success |
bool |
Whether query executed successfully |
sql |
str |
Generated SQL |
result |
list |
Query results as list of dicts |
error |
str |
Error message if failed |
columns |
list |
Column names |
row_count |
int |
Number of rows returned |
Example:
result = agent.run_query("show all products under $100")
if result['success']:
print(f"Found {result['row_count']} products")
for row in result['result']:
print(row)
else:
print(f"Error: {result['error']}")Get debug information including timing breakdown.
Returns: list[str] - Detailed processing steps
Module: beeai_agent.py
IBM BeeAI Framework-based agent with native WatsonX integration.
from beeai_agent import BeeAISQLAgent
agent = BeeAISQLAgent(
model_id="ibm/granite-4-h-small",
db_path="./data/database.db",
use_learning=True
)Execute a query using BeeAI Framework.
Parameters: Same as LangChainSQLAgent
Returns: Same structure as LangChainSQLAgent
Switch to a different model.
Returns: bool - Success status
Record user feedback for learning.
Get processing debug info with timing.
Module: self_improving_agent.py
Full self-improving agent with maximum learning capabilities.
from self_improving_agent import SelfImprovingAgent
agent = SelfImprovingAgent(
model_id="ibm/granite-4-h-small",
db_path="./data/database.db"
)Execute query with full learning pipeline.
Returns: Same structure as other agents, plus:
| Key | Type | Description |
|---|---|---|
similar_patterns |
list |
Similar patterns found |
errors_to_avoid |
list |
Error patterns avoided |
Get statistics about learned patterns.
Returns: dict with pattern counts and success rates
Module: learning_store.py
SQLite-based pattern storage with embedding-based similarity search.
from learning_store import get_learning_store
store = get_learning_store() # Singleton instanceFind semantically similar successful patterns.
Parameters:
| Parameter | Type | Description |
|---|---|---|
query |
str |
User query to match |
limit |
int |
Max patterns to return |
Returns: list[dict] with keys:
| Key | Type | Description |
|---|---|---|
id |
int |
Pattern ID |
user_query |
str |
Original query |
generated_sql |
str |
Successful SQL |
similarity |
float |
Cosine similarity (0-1) |
thumbs_up |
int |
Positive feedback count |
Example:
patterns = store.find_similar_patterns("show laptop revenue")
for p in patterns:
print(f"{p['similarity']:.2f}: {p['user_query']} → {p['generated_sql']}")Store a successful query pattern.
Returns: int - Pattern ID for feedback tracking
Find error patterns to avoid.
Returns: list[dict] with error details
Store a failed query for learning.
Record user feedback (thumbs up/down).
Get learning store statistics.
Returns: dict with counts and success rates
Module: semantic_classifier.py
Embedding-based query classifier with caching.
from semantic_classifier import get_semantic_classifier
classifier = get_semantic_classifier() # Singleton with cacheClassify query intent using embeddings.
Parameters:
| Parameter | Type | Description |
|---|---|---|
query |
str |
User query to classify |
Returns: tuple[str, float] - (classification, confidence)
Classifications:
'database'- Needs SQL generation'general'- Conversational response'help'- System guidance
Example:
classification, confidence = classifier.classify("show all products")
# Returns: ('database', 0.92)
classification, confidence = classifier.classify("hello")
# Returns: ('general', 0.88)Get embedding cache statistics.
Returns: dict with memory/disk cache sizes and hit rates
Module: query_classifier.py
Hybrid classification combining semantic, keyword, and schema validation.
Main classification function using all layers.
Parameters:
| Parameter | Type | Description |
|---|---|---|
query |
str |
User query |
Returns: tuple[str, str] - (classification, message)
Classifications:
| Value | Description |
|---|---|
'database' |
Query needs SQL |
'general' |
General conversation |
'help' |
Help request |
'unknown_entity' |
Entity not in database |
Example:
from query_classifier import classify_query
cls, msg = classify_query("show all products")
# Returns: ('database', '')
cls, msg = classify_query("what time is it")
# Returns: ('general', '')
cls, msg = classify_query("show BMW sales")
# Returns: ('unknown_entity', "I couldn't find 'BMW' in the database...")Explain why a query returned no results.
Returns: str - Human-readable explanation
Extract searchable terms from a query.
Returns: list[str] - Terms for database lookup
Module: context_manager.py
Manages conversation history for follow-up queries.
from context_manager import get_conversation_context, reset_conversation_context
ctx = get_conversation_context() # Singleton instanceAdd a query to conversation history.
Parameters:
| Parameter | Type | Description |
|---|---|---|
query |
str |
User query |
query_type |
str |
'database', 'general', 'help' |
sql |
str |
Generated SQL (if applicable) |
result_count |
int |
Number of results |
Check if query is a follow-up to previous.
Returns: tuple[bool, str] - (is_follow_up, follow_up_type)
Follow-up types: 'expand', 'similar', 'filter', 'sort', 'limit', 'reference'
Expand follow-up query with context.
Returns: str - Expanded query
Example:
ctx.add_query("show laptop sales", "database", "SELECT...", 10)
# Later...
expanded = ctx.resolve_follow_up("same for USA")
# Returns: "show laptop sales in USA"Get the most recent query context.
Returns: QueryContext or None
Clear conversation history.
Module: schema_loader.py
Database schema utilities.
Get detailed schema as dictionary.
Returns: dict with tables, columns, relationships
Get schema as formatted string for LLM prompts.
Returns: str - Human-readable schema description
Example:
from schema_loader import get_schema_string
schema = get_schema_string()
# Returns:
# """
# Tables:
# - customers: customer_id, customer_name, email, city, country, region
# - products: product_id, product_code, product_name, category, price
# - orders: order_id, customer_id, product_id, order_date, quantity, total_amount
#
# Views:
# - sales: Combined view with all customer, product, and order information
# """Get schema with semantic descriptions and sample values for better LLM accuracy.
Returns: str - Enriched schema string
Features:
- Column descriptions (e.g., "status: 1=New")
- Dynamic sample values (e.g., "country: 'USA', 'UK'")
- Join hints and View recommendations
Example:
from schema_loader import get_enriched_schema_string
schema = get_enriched_schema_string()
# Returns:
# """
# ### Table: customers
# | Column | Type | Description | Sample Values |
# |--------|------|-------------|---------------|
# | country| TEXT | Billing... | 'USA', 'UK' |
# """Module: product_mapper.py
Product name disambiguation utilities.
Get product name to code mapping.
Returns: dict[str, str] - name → product_code mapping
Check if query matches multiple products.
Returns: tuple[bool, list, str] - (is_ambiguous, matches, message)
Example:
from product_mapper import check_query_ambiguity
is_ambiguous, matches, msg = check_query_ambiguity("show wireless mouse sales")
if is_ambiguous:
print(msg)
# "Found multiple products: Wireless Mouse, Wireless Mouse Pro"Extract product references from query.
Returns: list[dict] - Product matches with codes
All modules use these environment variables:
| Variable | Required | Description |
|---|---|---|
WATSONX_API_KEY |
Yes | IBM WatsonX API key |
WATSONX_URL |
Yes | WatsonX endpoint URL |
WATSONX_PROJECT_ID |
Yes | WatsonX project ID |
DATABASE_PATH |
No | Path to SQLite database (default: ./data/database.db) |
LEARNING_DB_PATH |
No | Path to learning store (default: ./data/learning.db) |
WATSONX_MODEL_ID |
No | Default model ID |
All agents return structured error information:
result = agent.run_query("invalid query")
if not result['success']:
print(f"Error type: {result.get('error_type', 'unknown')}")
print(f"Error message: {result['error']}")
print(f"Attempted SQL: {result.get('sql', 'None')}")Common error types:
sql_syntax- Invalid SQL generatedno_results- Query executed but returned emptyexecution_error- Database error during executionllm_error- LLM generation failed