Skip to content

Latest commit

 

History

History
608 lines (400 loc) · 14.1 KB

File metadata and controls

608 lines (400 loc) · 14.1 KB

API Reference

Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026

This document provides detailed API documentation for all modules in the SQL Query Generator.


Table of Contents

  1. SQL Generation Agents
  2. Learning & Classification
  3. Context & Utilities

SQL Generation Agents

WatsonXClient

Module: watsonx_client.py

Direct WatsonX LLM client for SQL generation with optional learning capabilities.

from watsonx_client import WatsonXClient

client = WatsonXClient(
    model_id="ibm/granite-4-h-small",  # Optional: defaults to env or granite-4
    use_learning=True                   # Optional: enable pattern learning
)

Class Attributes

Attribute Type Description
AVAILABLE_MODELS dict Dictionary of supported WatsonX models
current_model_id str Currently active model ID

Methods

generate_sql_query(user_query, schema_info, product_mapping=None)

Generate SQL from natural language.

Parameters:

Parameter Type Description
user_query str Natural language query
schema_info str Database schema description
product_mapping dict Optional product name→code mapping

Returns: str - Generated SQL query

Example:

from schema_loader import get_schema_string
from product_mapper import get_product_mapping_dict

sql = client.generate_sql_query(
    user_query="show laptop sales in USA",
    schema_info=get_schema_string(),
    product_mapping=get_product_mapping_dict()
)
# Returns: "SELECT * FROM sales WHERE product_name LIKE '%laptop%' AND country = 'USA'"
switch_model(model_id)

Switch to a different LLM model.

Parameters:

Parameter Type Description
model_id str Model ID from AVAILABLE_MODELS

Returns: bool - True if switch successful

record_feedback(pattern_id, is_positive)

Record user feedback for a query pattern.

Parameters:

Parameter Type Description
pattern_id int ID from learning store
is_positive bool True for thumbs up, False for thumbs down
get_processing_steps()

Get debug information about the last query.

Returns: list[str] - Processing steps with timing


LangChainSQLAgent

Module: langchain_agent.py

LangChain-based SQL agent with automatic schema discovery and retry logic.

from langchain_agent import LangChainSQLAgent

agent = LangChainSQLAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db",
    use_learning=True
)

Methods

run_query(user_query, schema_info=None, product_mapping=None)

Execute a natural language query and return results.

Parameters:

Parameter Type Description
user_query str Natural language query
schema_info str Optional schema (auto-loaded if None)
product_mapping dict Optional product mapping

Returns: dict with keys:

Key Type Description
success bool Whether query executed successfully
sql str Generated SQL
result list Query results as list of dicts
error str Error message if failed
columns list Column names
row_count int Number of rows returned

Example:

result = agent.run_query("show all products under $100")

if result['success']:
    print(f"Found {result['row_count']} products")
    for row in result['result']:
        print(row)
else:
    print(f"Error: {result['error']}")
get_processing_steps()

Get debug information including timing breakdown.

Returns: list[str] - Detailed processing steps


BeeAISQLAgent

Module: beeai_agent.py

IBM BeeAI Framework-based agent with native WatsonX integration.

from beeai_agent import BeeAISQLAgent

agent = BeeAISQLAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db",
    use_learning=True
)

Methods

run_query(user_query, schema_info=None, product_mapping=None)

Execute a query using BeeAI Framework.

Parameters: Same as LangChainSQLAgent

Returns: Same structure as LangChainSQLAgent

switch_model(model_id)

Switch to a different model.

Returns: bool - Success status

record_feedback(pattern_id, is_positive)

Record user feedback for learning.

get_processing_steps()

Get processing debug info with timing.


SelfImprovingAgent

Module: self_improving_agent.py

Full self-improving agent with maximum learning capabilities.

from self_improving_agent import SelfImprovingAgent

agent = SelfImprovingAgent(
    model_id="ibm/granite-4-h-small",
    db_path="./data/database.db"
)

Methods

run_query(user_query, schema_info=None, product_mapping=None)

Execute query with full learning pipeline.

Returns: Same structure as other agents, plus:

Key Type Description
similar_patterns list Similar patterns found
errors_to_avoid list Error patterns avoided
get_learning_stats()

Get statistics about learned patterns.

Returns: dict with pattern counts and success rates


Learning & Classification

LearningStore

Module: learning_store.py

SQLite-based pattern storage with embedding-based similarity search.

from learning_store import get_learning_store

store = get_learning_store()  # Singleton instance

Methods

find_similar_patterns(query, limit=3)

Find semantically similar successful patterns.

Parameters:

Parameter Type Description
query str User query to match
limit int Max patterns to return

Returns: list[dict] with keys:

Key Type Description
id int Pattern ID
user_query str Original query
generated_sql str Successful SQL
similarity float Cosine similarity (0-1)
thumbs_up int Positive feedback count

Example:

patterns = store.find_similar_patterns("show laptop revenue")
for p in patterns:
    print(f"{p['similarity']:.2f}: {p['user_query']}{p['generated_sql']}")
store_success_pattern(user_query, generated_sql, execution_time_ms=None)

Store a successful query pattern.

Returns: int - Pattern ID for feedback tracking

find_common_errors(query, limit=3)

Find error patterns to avoid.

Returns: list[dict] with error details

store_error_pattern(user_query, attempted_sql, error_message)

Store a failed query for learning.

record_feedback(pattern_id, is_positive)

Record user feedback (thumbs up/down).

get_stats()

Get learning store statistics.

Returns: dict with counts and success rates


SemanticClassifier

Module: semantic_classifier.py

Embedding-based query classifier with caching.

from semantic_classifier import get_semantic_classifier

classifier = get_semantic_classifier()  # Singleton with cache

Methods

classify(query)

Classify query intent using embeddings.

Parameters:

Parameter Type Description
query str User query to classify

Returns: tuple[str, float] - (classification, confidence)

Classifications:

  • 'database' - Needs SQL generation
  • 'general' - Conversational response
  • 'help' - System guidance

Example:

classification, confidence = classifier.classify("show all products")
# Returns: ('database', 0.92)

classification, confidence = classifier.classify("hello")
# Returns: ('general', 0.88)
get_cache_stats()

Get embedding cache statistics.

Returns: dict with memory/disk cache sizes and hit rates


Query Classifier Functions

Module: query_classifier.py

Hybrid classification combining semantic, keyword, and schema validation.

classify_query(query)

Main classification function using all layers.

Parameters:

Parameter Type Description
query str User query

Returns: tuple[str, str] - (classification, message)

Classifications:

Value Description
'database' Query needs SQL
'general' General conversation
'help' Help request
'unknown_entity' Entity not in database

Example:

from query_classifier import classify_query

cls, msg = classify_query("show all products")
# Returns: ('database', '')

cls, msg = classify_query("what time is it")
# Returns: ('general', '')

cls, msg = classify_query("show BMW sales")
# Returns: ('unknown_entity', "I couldn't find 'BMW' in the database...")

analyze_empty_result(query, sql)

Explain why a query returned no results.

Returns: str - Human-readable explanation

extract_search_terms(query)

Extract searchable terms from a query.

Returns: list[str] - Terms for database lookup


Context & Utilities

ConversationContext

Module: context_manager.py

Manages conversation history for follow-up queries.

from context_manager import get_conversation_context, reset_conversation_context

ctx = get_conversation_context()  # Singleton instance

Methods

add_query(query, query_type, sql=None, result_count=None)

Add a query to conversation history.

Parameters:

Parameter Type Description
query str User query
query_type str 'database', 'general', 'help'
sql str Generated SQL (if applicable)
result_count int Number of results
is_follow_up(query)

Check if query is a follow-up to previous.

Returns: tuple[bool, str] - (is_follow_up, follow_up_type)

Follow-up types: 'expand', 'similar', 'filter', 'sort', 'limit', 'reference'

resolve_follow_up(query)

Expand follow-up query with context.

Returns: str - Expanded query

Example:

ctx.add_query("show laptop sales", "database", "SELECT...", 10)

# Later...
expanded = ctx.resolve_follow_up("same for USA")
# Returns: "show laptop sales in USA"
get_last_context()

Get the most recent query context.

Returns: QueryContext or None

clear()

Clear conversation history.


Schema Loader

Module: schema_loader.py

Database schema utilities.

get_schema()

Get detailed schema as dictionary.

Returns: dict with tables, columns, relationships

get_schema_string()

Get schema as formatted string for LLM prompts.

Returns: str - Human-readable schema description

Example:

from schema_loader import get_schema_string

schema = get_schema_string()
# Returns:
# """
# Tables:
# - customers: customer_id, customer_name, email, city, country, region
# - products: product_id, product_code, product_name, category, price
# - orders: order_id, customer_id, product_id, order_date, quantity, total_amount
#
# Views:
# - sales: Combined view with all customer, product, and order information
# """

get_enriched_schema_string()

Get schema with semantic descriptions and sample values for better LLM accuracy.

Returns: str - Enriched schema string

Features:

  • Column descriptions (e.g., "status: 1=New")
  • Dynamic sample values (e.g., "country: 'USA', 'UK'")
  • Join hints and View recommendations

Example:

from schema_loader import get_enriched_schema_string

schema = get_enriched_schema_string()
# Returns:
# """
# ### Table: customers
# | Column | Type | Description | Sample Values |
# |--------|------|-------------|---------------|
# | country| TEXT | Billing...  | 'USA', 'UK'   |
# """

Product Mapper

Module: product_mapper.py

Product name disambiguation utilities.

get_product_mapping_dict()

Get product name to code mapping.

Returns: dict[str, str] - name → product_code mapping

check_query_ambiguity(query)

Check if query matches multiple products.

Returns: tuple[bool, list, str] - (is_ambiguous, matches, message)

Example:

from product_mapper import check_query_ambiguity

is_ambiguous, matches, msg = check_query_ambiguity("show wireless mouse sales")
if is_ambiguous:
    print(msg)
    # "Found multiple products: Wireless Mouse, Wireless Mouse Pro"

analyze_query_for_products(query)

Extract product references from query.

Returns: list[dict] - Product matches with codes


Environment Variables

All modules use these environment variables:

Variable Required Description
WATSONX_API_KEY Yes IBM WatsonX API key
WATSONX_URL Yes WatsonX endpoint URL
WATSONX_PROJECT_ID Yes WatsonX project ID
DATABASE_PATH No Path to SQLite database (default: ./data/database.db)
LEARNING_DB_PATH No Path to learning store (default: ./data/learning.db)
WATSONX_MODEL_ID No Default model ID

Error Handling

All agents return structured error information:

result = agent.run_query("invalid query")

if not result['success']:
    print(f"Error type: {result.get('error_type', 'unknown')}")
    print(f"Error message: {result['error']}")
    print(f"Attempted SQL: {result.get('sql', 'None')}")

Common error types:

  • sql_syntax - Invalid SQL generated
  • no_results - Query executed but returned empty
  • execution_error - Database error during execution
  • llm_error - LLM generation failed