API Reference

Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026

This document provides detailed API documentation for all modules in the SQL Query Generator.

SQL Generation Agents
Learning & Classification
Context & Utilities

SQL Generation Agents

WatsonXClient

Module: watsonx_client.py

Direct WatsonX LLM client for SQL generation with optional learning capabilities.

from watsonx_client import WatsonXClient

client = WatsonXClient(
    model_id="ibm/granite-4-h-small",  # Optional: defaults to env or granite-4
    use_learning=True                   # Optional: enable pattern learning
)

Class Attributes

Attribute	Type	Description
`AVAILABLE_MODELS`	`dict`	Dictionary of supported WatsonX models
`current_model_id`	`str`	Currently active model ID

Methods

`generate_sql_query(user_query, schema_info, product_mapping=None)`

Generate SQL from natural language.

Parameters:

Parameter	Type	Description
`user_query`	`str`	Natural language query
`schema_info`	`str`	Database schema description
`product_mapping`	`dict`	Optional product name→code mapping

Returns: str - Generated SQL query

Example:

from schema_loader import get_schema_string
from product_mapper import get_product_mapping_dict

sql = client.generate_sql_query(
    user_query="show laptop sales in USA",
    schema_info=get_schema_string(),
    product_mapping=get_product_mapping_dict()
)
# Returns: "SELECT * FROM sales WHERE product_name LIKE '%laptop%' AND country = 'USA'"

`switch_model(model_id)`

Switch to a different LLM model.

Parameters:

Parameter	Type	Description
`model_id`	`str`	Model ID from `AVAILABLE_MODELS`

Returns: bool - True if switch successful

`record_feedback(pattern_id, is_positive)`

Record user feedback for a query pattern.

Parameters:

Parameter	Type	Description
`pattern_id`	`int`	ID from learning store
`is_positive`	`bool`	True for thumbs up, False for thumbs down

`get_processing_steps()`

Get debug information about the last query.

Returns: list[str] - Processing steps with timing

LangChainSQLAgent

Module: langchain_agent.py

LangChain-based SQL agent with automatic schema discovery and retry logic.

from langchain_agent import LangChainSQLAgent

agent = LangChainSQLAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db",
    use_learning=True
)

Methods

`run_query(user_query, schema_info=None, product_mapping=None)`

Execute a natural language query and return results.

Parameters:

Parameter	Type	Description
`user_query`	`str`	Natural language query
`schema_info`	`str`	Optional schema (auto-loaded if None)
`product_mapping`	`dict`	Optional product mapping

Returns: dict with keys:

Key	Type	Description
`success`	`bool`	Whether query executed successfully
`sql`	`str`	Generated SQL
`result`	`list`	Query results as list of dicts
`error`	`str`	Error message if failed
`columns`	`list`	Column names
`row_count`	`int`	Number of rows returned

Example:

result = agent.run_query("show all products under $100")

if result['success']:
    print(f"Found {result['row_count']} products")
    for row in result['result']:
        print(row)
else:
    print(f"Error: {result['error']}")

`get_processing_steps()`

Get debug information including timing breakdown.

Returns: list[str] - Detailed processing steps

BeeAISQLAgent

Module: beeai_agent.py

IBM BeeAI Framework-based agent with native WatsonX integration.

from beeai_agent import BeeAISQLAgent

agent = BeeAISQLAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db",
    use_learning=True
)

Methods

`run_query(user_query, schema_info=None, product_mapping=None)`

Execute a query using BeeAI Framework.

Parameters: Same as LangChainSQLAgent

Returns: Same structure as LangChainSQLAgent

`switch_model(model_id)`

Switch to a different model.

Returns: bool - Success status

`record_feedback(pattern_id, is_positive)`

Record user feedback for learning.

`get_processing_steps()`

Get processing debug info with timing.

SelfImprovingAgent

Module: self_improving_agent.py

Full self-improving agent with maximum learning capabilities.

from self_improving_agent import SelfImprovingAgent

agent = SelfImprovingAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db"
)

Methods

`run_query(user_query, schema_info=None, product_mapping=None)`

Execute query with full learning pipeline.

Returns: Same structure as other agents, plus:

Key	Type	Description
`similar_patterns`	`list`	Similar patterns found
`errors_to_avoid`	`list`	Error patterns avoided

`get_learning_stats()`

Get statistics about learned patterns.

Returns: dict with pattern counts and success rates

Learning & Classification

LearningStore

Module: learning_store.py

SQLite-based pattern storage with embedding-based similarity search.

from learning_store import get_learning_store

store = get_learning_store()  # Singleton instance

Methods

`find_similar_patterns(query, limit=3)`

Find semantically similar successful patterns.

Parameters:

Parameter	Type	Description
`query`	`str`	User query to match
`limit`	`int`	Max patterns to return

Returns: list[dict] with keys:

Key	Type	Description
`id`	`int`	Pattern ID
`user_query`	`str`	Original query
`generated_sql`	`str`	Successful SQL
`similarity`	`float`	Cosine similarity (0-1)
`thumbs_up`	`int`	Positive feedback count

Example:

patterns = store.find_similar_patterns("show laptop revenue")
for p in patterns:
    print(f"{p['similarity']:.2f}: {p['user_query']} → {p['generated_sql']}")

`store_success_pattern(user_query, generated_sql, execution_time_ms=None)`

Store a successful query pattern.

Returns: int - Pattern ID for feedback tracking

`find_common_errors(query, limit=3)`

Find error patterns to avoid.

Returns: list[dict] with error details

`store_error_pattern(user_query, attempted_sql, error_message)`

Store a failed query for learning.

`record_feedback(pattern_id, is_positive)`

Record user feedback (thumbs up/down).

`get_stats()`

Get learning store statistics.

Returns: dict with counts and success rates

SemanticClassifier

Module: semantic_classifier.py

Embedding-based query classifier with caching.

from semantic_classifier import get_semantic_classifier

classifier = get_semantic_classifier()  # Singleton with cache

Methods

`classify(query)`

Classify query intent using embeddings.

Parameters:

Parameter	Type	Description
`query`	`str`	User query to classify

Returns: tuple[str, float] - (classification, confidence)

Classifications:

'database' - Needs SQL generation
'general' - Conversational response
'help' - System guidance

Example:

classification, confidence = classifier.classify("show all products")
# Returns: ('database', 0.92)

classification, confidence = classifier.classify("hello")
# Returns: ('general', 0.88)

`get_cache_stats()`

Get embedding cache statistics.

Returns: dict with memory/disk cache sizes and hit rates

Query Classifier Functions

Module: query_classifier.py

Hybrid classification combining semantic, keyword, and schema validation.

`classify_query(query)`

Main classification function using all layers.

Parameters:

Parameter	Type	Description
`query`	`str`	User query

Returns: tuple[str, str] - (classification, message)

Classifications:

Value	Description
`'database'`	Query needs SQL
`'general'`	General conversation
`'help'`	Help request
`'unknown_entity'`	Entity not in database

Example:

from query_classifier import classify_query

cls, msg = classify_query("show all products")
# Returns: ('database', '')

cls, msg = classify_query("what time is it")
# Returns: ('general', '')

cls, msg = classify_query("show BMW sales")
# Returns: ('unknown_entity', "I couldn't find 'BMW' in the database...")

`analyze_empty_result(query, sql)`

Explain why a query returned no results.

Returns: str - Human-readable explanation

`extract_search_terms(query)`

Extract searchable terms from a query.

Returns: list[str] - Terms for database lookup

Context & Utilities

ConversationContext

Module: context_manager.py

Manages conversation history for follow-up queries.

from context_manager import get_conversation_context, reset_conversation_context

ctx = get_conversation_context()  # Singleton instance

Methods

`add_query(query, query_type, sql=None, result_count=None)`

Add a query to conversation history.

Parameters:

Parameter	Type	Description
`query`	`str`	User query
`query_type`	`str`	'database', 'general', 'help'
`sql`	`str`	Generated SQL (if applicable)
`result_count`	`int`	Number of results

`is_follow_up(query)`

Check if query is a follow-up to previous.

Returns: tuple[bool, str] - (is_follow_up, follow_up_type)

Follow-up types: 'expand', 'similar', 'filter', 'sort', 'limit', 'reference'

`resolve_follow_up(query)`

Expand follow-up query with context.

Returns: str - Expanded query

Example:

ctx.add_query("show laptop sales", "database", "SELECT...", 10)

# Later...
expanded = ctx.resolve_follow_up("same for USA")
# Returns: "show laptop sales in USA"

`get_last_context()`

Get the most recent query context.

Returns: QueryContext or None

`clear()`

Clear conversation history.

Schema Loader

Module: schema_loader.py

Database schema utilities.

`get_schema()`

Get detailed schema as dictionary.

Returns: dict with tables, columns, relationships

`get_schema_string()`

Get schema as formatted string for LLM prompts.

Returns: str - Human-readable schema description

Example:

from schema_loader import get_schema_string

schema = get_schema_string()
# Returns:
# """
# Tables:
# - customers: customer_id, customer_name, email, city, country, region
# - products: product_id, product_code, product_name, category, price
# - orders: order_id, customer_id, product_id, order_date, quantity, total_amount
#
# Views:
# - sales: Combined view with all customer, product, and order information
# """

`get_enriched_schema_string()`

Get schema with semantic descriptions and sample values for better LLM accuracy.

Returns: str - Enriched schema string

Features:

Column descriptions (e.g., "status: 1=New")
Dynamic sample values (e.g., "country: 'USA', 'UK'")
Join hints and View recommendations

Example:

from schema_loader import get_enriched_schema_string

schema = get_enriched_schema_string()
# Returns:
# """
# ### Table: customers
# | Column | Type | Description | Sample Values |
# |--------|------|-------------|---------------|
# | country| TEXT | Billing...  | 'USA', 'UK'   |
# """

Product Mapper

Module: product_mapper.py

Product name disambiguation utilities.

`get_product_mapping_dict()`

Get product name to code mapping.

Returns: dict[str, str] - name → product_code mapping

`check_query_ambiguity(query)`

Check if query matches multiple products.

Returns: tuple[bool, list, str] - (is_ambiguous, matches, message)

Example:

from product_mapper import check_query_ambiguity

is_ambiguous, matches, msg = check_query_ambiguity("show wireless mouse sales")
if is_ambiguous:
    print(msg)
    # "Found multiple products: Wireless Mouse, Wireless Mouse Pro"

`analyze_query_for_products(query)`

Extract product references from query.

Returns: list[dict] - Product matches with codes

Environment Variables

All modules use these environment variables:

Variable	Required	Description
`WATSONX_API_KEY`	Yes	IBM WatsonX API key
`WATSONX_URL`	Yes	WatsonX endpoint URL
`WATSONX_PROJECT_ID`	Yes	WatsonX project ID
`DATABASE_PATH`	No	Path to SQLite database (default: `./data/database.db`)
`LEARNING_DB_PATH`	No	Path to learning store (default: `./data/learning.db`)
`WATSONX_MODEL_ID`	No	Default model ID

Error Handling

All agents return structured error information:

result = agent.run_query("invalid query")

if not result['success']:
    print(f"Error type: {result.get('error_type', 'unknown')}")
    print(f"Error message: {result['error']}")
    print(f"Attempted SQL: {result.get('sql', 'None')}")

Common error types:

sql_syntax - Invalid SQL generated
no_results - Query executed but returned empty
execution_error - Database error during execution
llm_error - LLM generation failed

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Reference

Table of Contents

SQL Generation Agents

WatsonXClient

Class Attributes

Methods

generate_sql_query(user_query, schema_info, product_mapping=None)

switch_model(model_id)

record_feedback(pattern_id, is_positive)

get_processing_steps()

LangChainSQLAgent

Methods

run_query(user_query, schema_info=None, product_mapping=None)

get_processing_steps()

BeeAISQLAgent

Methods

run_query(user_query, schema_info=None, product_mapping=None)

switch_model(model_id)

record_feedback(pattern_id, is_positive)

get_processing_steps()

SelfImprovingAgent

Methods

run_query(user_query, schema_info=None, product_mapping=None)

get_learning_stats()

Learning & Classification

LearningStore

Methods

find_similar_patterns(query, limit=3)

store_success_pattern(user_query, generated_sql, execution_time_ms=None)

find_common_errors(query, limit=3)

store_error_pattern(user_query, attempted_sql, error_message)

record_feedback(pattern_id, is_positive)

get_stats()

SemanticClassifier

Methods

classify(query)

get_cache_stats()

Query Classifier Functions

classify_query(query)

analyze_empty_result(query, sql)

extract_search_terms(query)

Context & Utilities

ConversationContext

Methods

add_query(query, query_type, sql=None, result_count=None)

is_follow_up(query)

resolve_follow_up(query)

get_last_context()

clear()

Schema Loader

get_schema()

get_schema_string()

get_enriched_schema_string()

Product Mapper

get_product_mapping_dict()

check_query_ambiguity(query)

analyze_query_for_products(query)

Environment Variables

Error Handling

`generate_sql_query(user_query, schema_info, product_mapping=None)`

`switch_model(model_id)`

`record_feedback(pattern_id, is_positive)`

`get_processing_steps()`

`run_query(user_query, schema_info=None, product_mapping=None)`

`get_processing_steps()`

`run_query(user_query, schema_info=None, product_mapping=None)`

`switch_model(model_id)`

`record_feedback(pattern_id, is_positive)`

`get_processing_steps()`

`run_query(user_query, schema_info=None, product_mapping=None)`

`get_learning_stats()`

`find_similar_patterns(query, limit=3)`

`store_success_pattern(user_query, generated_sql, execution_time_ms=None)`

`find_common_errors(query, limit=3)`

`store_error_pattern(user_query, attempted_sql, error_message)`

`record_feedback(pattern_id, is_positive)`

`get_stats()`

`classify(query)`

`get_cache_stats()`

`classify_query(query)`

`analyze_empty_result(query, sql)`

`extract_search_terms(query)`

`add_query(query, query_type, sql=None, result_count=None)`

`is_follow_up(query)`

`resolve_follow_up(query)`

`get_last_context()`

`clear()`

`get_schema()`

`get_schema_string()`

`get_enriched_schema_string()`

`get_product_mapping_dict()`

`check_query_ambiguity(query)`

`analyze_query_for_products(query)`