This document covers how to integrate the search system into your agents, configure search behavior, use metadata and tags for filtering, tune search quality, customize response formatting, and deploy multi-collection search architectures. For background on building search indexes, see search_indexing.md. For deployment patterns, see search_deployment.md. For general search system architecture, see search_overview.md.
The native_vector_search skill provides search functionality to agents. The simplest integration requires only a tool name, description, and index path:
from signalwire import AgentBase
class DocsAgent(AgentBase):
def __init__(self):
super().__init__(
name="DocsAgent",
route="/docs"
)
# Add search skill
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search the documentation for information",
"index_path": "./knowledge.swsearch"
})The agent automatically exposes this as a SWAIG (SignalWire AI Gateway) function that the LLM can invoke when it determines a search is needed. SWAIG is the platform's AI tool-calling system.
tool_name (string)
- Name of the SWAIG function exposed to the LLM.
- Should be descriptive:
search_docs,search_api,search_knowledge.
description (string)
- Tells the LLM when to use this function.
- Be specific: "Search the API documentation for information about endpoints, authentication, and request formats".
count (integer, default: 5)
- Number of results to return.
- More results provide more context but consume more tokens.
{
"count": 3 # Return top 3 results
}distance_threshold (float, default: 0.5)
- Minimum similarity score (0.0 to 1.0).
- Lower values produce stricter matching; higher values are more permissive.
{
"distance_threshold": 0.4 # Only results with similarity > 0.4
}tags (list, optional)
- Filter results by tags.
- Only chunks with these tags are returned.
{
"tags": ["api", "reference"] # Only API reference chunks
}no_results_message (string, optional)
- Message returned when no results are found.
- Use
{query}as a placeholder for the search query.
{
"no_results_message": "I couldn't find information about '{query}' in the documentation. Try rephrasing your question."
}max_content_length (integer, default: 32768)
- Maximum total characters in the search response.
- Prevents response truncation.
- Budget is distributed across results.
{
"max_content_length": 16384 # 16KB total
}response_prefix (string, optional)
- Text prepended to the search response.
response_postfix (string, optional)
- Text appended to the search response.
{
"response_prefix": "Based on the documentation, here's what I found:",
"response_postfix": "Would you like me to search for more specific information?"
}swaig_fields (dict, optional)
- SWAIG-specific configuration, including function fillers for better voice UX.
{
"swaig_fields": {
"fillers": {
"en-US": [
"Let me search the documentation...",
"I'm looking through the docs...",
"Searching for that information..."
]
}
}
}Choose between NLTK (fast) and spaCy (better quality) for query processing:
# Fast NLTK processing (default)
{
"nlp_backend": "nltk" # ~50-100ms query processing
}
# Better quality spaCy processing
{
"nlp_backend": "spacy" # ~150-300ms query processing, requires model download
}When using build_index: True for auto-building indexes:
{
"model_name": "mini" # Fast, 5x faster, good for most use cases
# "model_name": "base" # Balanced, better quality
# "model_name": "large" # Best quality
}See Search Indexing for detailed model comparisons.
The search system supports multiple storage backends. Choose one per skill instance.
Option 1: Local .swsearch file (SQLite)
{
"index_path": "./knowledge.swsearch"
}Best for single-agent deployments, development, and small to medium datasets. Portable single-file storage.
Option 2: Remote search server
{
"remote_url": "http://localhost:8001",
"index_name": "docs"
}Best for centralized index management and lower per-agent memory usage.
Option 3: pgvector database
{
"backend": "pgvector",
"connection_string": "postgresql://user:pass@localhost:5432/db",
"collection_name": "docs",
"model_name": "mini" # Must match model used during indexing
}Best for production deployments, multi-agent systems, and large datasets. Supports concurrent access and real-time updates.
| Feature | SQLite | pgvector |
|---|---|---|
| Setup complexity | None | Requires PostgreSQL |
| Scalability | Limited | Excellent |
| Concurrent access | Poor | Excellent |
| Update capability | Rebuild required | Real-time |
| Performance (small datasets) | Excellent | Good |
| Performance (large datasets) | Poor | Excellent |
| Deployment | File copy | Database connection |
| Multi-agent sharing | Separate copies | Shared knowledge |
A fully configured search skill using pgvector:
class ProductAgent(AgentBase):
def __init__(self):
super().__init__(
name="ProductAgent",
route="/product"
)
# Configure search with all options
self.add_skill("native_vector_search", {
# Identity
"tool_name": "search_product_docs",
"description": "Search comprehensive product documentation including features, configuration, troubleshooting, and API references",
# Backend (pgvector)
"backend": "pgvector",
"connection_string": os.getenv("PGVECTOR_CONNECTION"),
"collection_name": "product_docs",
"model_name": "mini",
# Search behavior
"count": 5,
"distance_threshold": 0.4,
# User experience
"no_results_message": "I couldn't find information about '{query}' in our documentation. Could you rephrase or ask about a different topic?",
"max_content_length": 32768,
# SWAIG configuration
"swaig_fields": {
"fillers": {
"en-US": [
"Let me check our documentation...",
"I'm searching for that information...",
"Looking through the product docs...",
"One moment while I find that..."
]
}
}
})When the LLM invokes a search function, the system performs these steps internally:
- The query is preprocessed using the selected NLP backend (synonym expansion, keyword extraction, optional POS tagging).
- A vector embedding is generated for the query.
- Vector search retrieves
3xthe requestedcountas candidates. - Hybrid scoring combines vector similarity with keyword matching and metadata signals.
- The top
countresults are selected and formatted. - Results are returned to the LLM as the function response, which the LLM uses to formulate its answer to the user.
The max_content_length parameter controls the total response size. The budget is distributed evenly across results with room for per-result overhead:
overhead_per_result = 300 chars # Metadata, formatting
total_overhead = count * 300
available_for_content = max_content_length - total_overhead
per_result_limit = available_for_content / count
For example, with count=5 and max_content_length=32768:
- Each result gets approximately 6,253 characters of content.
This budgeting prevents LLM context exhaustion. If a result exceeds its budget, it is truncated. To detect this, use a format callback (see Custom Response Formatting).
Tuning content length by use case:
| Use Case | Recommended max_content_length | Rationale |
|---|---|---|
| Voice agents | 16384 (16KB) | Shorter content produces faster, more focused spoken responses |
| Chat agents | 32768 (32KB, default) | Can handle more detailed information in text |
| Deep research queries | 65536 (64KB) | Comprehensive answers requiring lots of context |
Voice and chat modes have different requirements for how search results should be presented to the LLM:
- Voice mode: Responses should be conversational. URLs, code snippets, and markdown formatting should not be read aloud. Technical concepts should be summarized clearly.
- Chat mode: Responses can include URLs, formatted code blocks, links, and comprehensive technical details.
Use the response_format_callback parameter (see Custom Response Formatting) to adapt output based on mode. Additionally, prompt the agent to handle each mode appropriately.
Every chunk stored in a search index includes metadata -- structured information beyond the text content itself. Metadata enables filtering, boosting, organization, and provenance tracking.
The SDK adds metadata automatically based on the chunking strategy used during indexing.
Sentence strategy:
{
"chunk_method": "sentence",
"chunk_index": 5,
"sentence_count": 8,
"filename": "docs.md"
}Markdown strategy:
{
"chunk_method": "markdown",
"chunk_index": 3,
"h1": "Getting Started",
"h2": "Installation",
"h3": "Python Setup",
"depth": 3,
"has_code": True,
"code_languages": ["python", "bash"],
"tags": ["code", "code:python", "code:bash"],
"filename": "installation.md"
}QA strategy:
{
"chunk_method": "qa_optimized",
"chunk_index": 2,
"has_question": True,
"has_process": True,
"sentence_count": 6
}Add tags when building the index:
sw-search ./docs \
--tags documentation,api,v2,production \
--output docs.swsearchAll chunks in the build receive these tags.
When using the JSON chunking strategy, add any metadata fields:
{
"chunks": [
{
"content": "API authentication requires a Bearer token...",
"metadata": {
"category": "security",
"priority": "high",
"difficulty": "beginner",
"tags": ["authentication", "security", "api"],
"last_updated": "2025-01-15",
"author": "security-team",
"related_topics": ["authorization", "tokens"],
"estimated_time": "5 minutes"
}
}
]
}Generate metadata when creating JSON programmatically:
import json
from datetime import datetime
def create_chunk_with_metadata(content, category, difficulty):
"""Create a chunk with rich metadata"""
return {
"content": content,
"metadata": {
"category": category,
"difficulty": difficulty,
"tags": [category, difficulty, "generated"],
"created_at": datetime.now().isoformat(),
"word_count": len(content.split()),
"has_url": "http" in content,
"has_code": "```" in content or "def " in content
}
}
chunks = {
"chunks": [
create_chunk_with_metadata(
"The AgentBase class provides...",
category="api-reference",
difficulty="intermediate"
),
create_chunk_with_metadata(
"To get started, install the package...",
category="getting-started",
difficulty="beginner"
)
]
}
with open("chunks.json", "w") as f:
json.dump(chunks, f, indent=2)Tags enable precise filtering during search. Specify tags in the skill configuration to restrict which chunks are searched:
# Only search API documentation
self.add_skill("native_vector_search", {
"tool_name": "search_api",
"description": "Search API documentation",
"index_path": "./docs.swsearch",
"tags": ["api", "reference"]
})Multiple search skills can use the same index with different tag filters:
class DocumentationAgent(AgentBase):
def __init__(self):
super().__init__(name="DocsAgent")
# Beginner-friendly docs
self.add_skill("native_vector_search", {
"tool_name": "search_getting_started",
"description": "Search beginner guides and tutorials",
"index_path": "./docs.swsearch",
"tags": ["beginner", "tutorial", "getting-started"]
})
# Advanced technical docs
self.add_skill("native_vector_search", {
"tool_name": "search_advanced",
"description": "Search advanced documentation and technical details",
"index_path": "./docs.swsearch",
"tags": ["advanced", "technical"]
})
# API reference only
self.add_skill("native_vector_search", {
"tool_name": "search_api_reference",
"description": "Search API documentation for classes, methods, and parameters",
"index_path": "./docs.swsearch",
"tags": ["api", "reference", "code"]
})The LLM selects the appropriate search function based on the user's question.
In hybrid search mode, metadata matching provides confirmation signals that boost relevance scores. The boost is applied multiplicatively on top of the vector similarity score.
Example scenario: User searches for "python authentication example"
Chunk A:
{
"content": "Here's a Python authentication example...",
"metadata": {
"tags": ["python", "authentication", "example", "code"],
"code_languages": ["python"]
}
}- Vector similarity: 0.75
- Metadata matches: "python", "authentication", "example" (3 matches) -- boost +30%
- Has "code" tag with keywords matched -- boost +20%
- Final score: 0.75 x 1.30 x 1.20 = 1.17
Chunk B:
{
"content": "Authentication is important for security...",
"metadata": {
"tags": ["security", "authentication"]
}
}- Vector similarity: 0.82 (higher raw score)
- Metadata matches: "authentication" (1 match) -- boost +15%
- Final score: 0.82 x 1.15 = 0.94
Result: Chunk A ranks higher despite lower vector similarity because metadata confirmed it matches the user's intent more precisely.
Use categories and subcategories to structure large knowledge bases:
{
"chunks": [
{
"content": "...",
"metadata": {
"category": "getting-started",
"subcategory": "installation",
"tags": ["beginner", "setup"]
}
},
{
"content": "...",
"metadata": {
"category": "api-reference",
"subcategory": "core-classes",
"tags": ["api", "reference", "advanced"]
}
},
{
"content": "...",
"metadata": {
"category": "troubleshooting",
"subcategory": "errors",
"tags": ["troubleshooting", "debugging", "errors"]
}
}
]
}Then create category-specific search tools:
self.add_skill("native_vector_search", {
"tool_name": "search_troubleshooting",
"description": "Search troubleshooting guides for error solutions",
"index_path": "./docs.swsearch",
"tags": ["troubleshooting", "errors"]
})Mark important content with priority levels:
{
"content": "Critical security notice: Always validate input...",
"metadata": {
"priority": "critical",
"category": "security",
"tags": ["security", "important", "critical"]
}
}Use a custom formatter to surface priority in the response:
def _format_with_priority(self, response, agent, query, results, **kwargs):
"""Highlight high-priority results"""
formatted = ""
for result in results:
priority = result.get('metadata', {}).get('priority', 'normal')
if priority == 'critical':
formatted += "CRITICAL: "
elif priority == 'high':
formatted += "HIGH PRIORITY: "
formatted += result['content'] + "\n\n"
return formattedTag content by target audience:
{
"content": "Advanced memory optimization techniques...",
"metadata": {
"audience": "expert",
"difficulty": "advanced",
"tags": ["expert", "performance", "optimization"]
}
}Track content freshness:
{
"content": "New feature in v2.0: async support...",
"metadata": {
"version": "2.0",
"created_at": "2025-01-15",
"last_updated": "2025-01-20",
"tags": ["new", "v2", "async"]
}
}Link related chunks for learning paths:
{
"chunk_id": "auth_overview",
"content": "Authentication overview...",
"metadata": {
"related_chunks": ["auth_examples", "auth_errors"],
"prerequisite": "installation",
"next_topic": "authorization"
}
}For multilingual content, tag by language:
{
"content": "Le SDK SignalWire permet...",
"metadata": {
"language": "fr",
"translated_from": "en",
"tags": ["french", "documentation"]
}
}Filter by language in the skill configuration:
self.add_skill("native_vector_search", {
"tool_name": "search_french_docs",
"description": "Rechercher la documentation en francais",
"index_path": "./docs.swsearch",
"tags": ["french"]
})-
Be consistent. Use the same metadata field names across your knowledge base. Avoid mixing
categoryandtype, ordifficultyandlevel. -
Use tags liberally. More tags create more opportunities for hybrid search boosting:
{ "tags": ["authentication", "security", "api", "bearer-token", "auth", "login", "credentials"] } -
Include synonyms. Users search with different terms:
{ "tags": ["installation", "setup", "getting-started", "install", "configure"] } -
Structure hierarchically.
{ "category": "development", "subcategory": "testing", "topic": "unit-tests" } -
Track provenance.
{ "source": "official-docs", "author": "signalwire-team", "verified": true, "last_reviewed": "2025-01-15" }
Three parameters have the most significant impact on search quality.
Controls how similar results must be to the query. Values range from 0.0 to 1.0, where higher means more similar:
| Score Range | Meaning |
|---|---|
| 1.0 | Identical vectors (perfect match) |
| 0.8 | Very similar |
| 0.5 | Somewhat similar |
| 0.2 | Barely related |
| 0.0 | Completely different |
Recommended thresholds by content type:
| Content Type | Threshold | Rationale |
|---|---|---|
| Technical documentation (code, APIs) | 0.4 | Specific content; prevents off-topic matches |
| General knowledge base (FAQs, guides) | 0.5 | Conversational content benefits from broader matching |
| Creative content (blogs, articles) | 0.6 | Varied language requires a wider net |
| Precise lookups (error codes, model names) | 0.3 | Exact matches matter; be strict |
Testing threshold values with the CLI:
sw-search search ./docs.swsearch "your query" --threshold 0.3 --verbose
sw-search search ./docs.swsearch "your query" --threshold 0.4 --verbose
sw-search search ./docs.swsearch "your query" --threshold 0.5 --verboseThe verbose output displays similarity scores for each result. If the threshold excludes results that appear relevant, lower it. If irrelevant results appear, raise it.
Dynamic threshold strategy:
For agents that need reliability across diverse queries, implement fallback logic:
def search_with_fallback(self, query):
"""Search with fallback to lower threshold"""
# Try strict first
results = self.search(query, threshold=0.5)
if len(results) < 2:
# Not enough results, try more permissive
results = self.search(query, threshold=0.4)
if len(results) < 1:
# Still nothing, try very permissive
results = self.search(query, threshold=0.3)
return resultsDetermines how many results to return. The hybrid search engine internally retrieves 3x the requested count, scores all candidates, and returns the top results.
Trade-offs by count range:
| Range | Advantages | Disadvantages |
|---|---|---|
| 1-3 | Precise answers, less noise, faster LLM processing, lower token costs | Might miss context, limited perspective |
| 5-7 | More context, multiple perspectives, better coverage | More noise, slower processing, higher token costs |
| 10+ | Maximum coverage | Information overload, diminishing returns, truncation risk |
Recommendations by query type:
- Specific questions ("How do I authenticate?"):
count: 3 - Exploratory questions ("What are the authentication options?"):
count: 5 - Research queries ("Tell me everything about authentication"):
count: 7
Controls total response size. See Content Length Budgeting for details on how the budget is distributed.
Enable verbose logging to see hybrid scoring details:
import os
os.environ['SEARCH_DEBUG'] = '1'Debug output shows the scoring breakdown for each candidate:
Query: "python authentication examples"
Candidate pool: 15 chunks
Chunk 1:
Vector score: 0.82
Keyword matches: ["python", "authentication", "examples"]
Keyword boost: +0.45 (3 matches x 0.15)
Has 'code' tag: +0.20
Final score: 0.82 x 1.45 x 1.20 = 1.43
Chunk 2:
Vector score: 0.75
Keyword matches: ["authentication"]
Keyword boost: +0.15 (1 match x 0.15)
Final score: 0.75 x 1.15 = 0.86
Returning top 5 results...
This reveals why certain results rank higher or lower than expected.
Additionally, enable general verbose logging for the agent:
import os
os.environ['SIGNALWIRE_LOG_LEVEL'] = 'DEBUG'Symptoms: Search returns empty results for queries that should match content.
Possible causes and fixes:
- Threshold too strict. Lower
distance_threshold: 0.5 to 0.4 to 0.3. - Query phrasing mismatch. Test variations: "authentication setup", "configuring auth", "setting up authentication".
- Content gap. Verify the topic exists in the index:
sw-search search ./docs.swsearch "test query". - Index not loaded. Check agent logs for errors during skill initialization.
Symptoms: Results do not match the query intent.
Possible causes and fixes:
- Threshold too permissive. Raise
distance_threshold: 0.4 to 0.5. - Poor metadata tagging. Add tags for filtering:
tags=["python", "code"]. - Chunking mixed unrelated content. Use the markdown chunking strategy for code documentation.
Symptoms: A clearly relevant result appears below less relevant ones.
Possible causes and fixes:
- Missing metadata on the best result. Add relevant tags:
["voice", "configuration"]. - Hybrid scoring not boosting correctly. Verify metadata terms match query keywords.
- Code chunks missing the "code" tag. Use the markdown chunking strategy to add automatic code tags.
Symptoms: Results appear cut off mid-sentence.
Possible causes and fixes:
- max_content_length too low. Increase: 32768 to 65536.
- Too many results consuming the budget. Reduce
count: 7 to 5. - Individual chunks too long. Improve chunking to produce shorter chunks.
Test different configurations in production to find optimal settings:
import random
class ABTestAgent(AgentBase):
def __init__(self):
super().__init__(name="ABTest")
# Randomly assign configuration
config_a = {"threshold": 0.4, "count": 5}
config_b = {"threshold": 0.5, "count": 3}
config = config_a if random.random() < 0.5 else config_b
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search documentation",
"index_path": "./docs.swsearch",
**config
})
# Log which config was used
logger.info(f"Using config: {config}")Metrics to track:
- User satisfaction ratings
- Follow-up question rate (lower is better -- the first answer sufficed)
- Query success rate (non-empty, relevant results)
- Search latency
Systematic evaluation approach:
test_queries = [
"how to create an agent",
"authentication methods",
"error handling",
"voice configuration",
"deployment options",
"python code examples",
"troubleshooting connection issues"
]
configs = [
{"threshold": 0.3, "count": 3},
{"threshold": 0.4, "count": 3},
{"threshold": 0.4, "count": 5},
{"threshold": 0.5, "count": 5},
]
for config in configs:
print(f"\nTesting: {config}")
for query in test_queries:
results = search(query, **config)
print(f" {query}: {len(results)} results")For each configuration, evaluate:
- Precision: Are all returned results relevant?
- Recall: Were all relevant chunks found?
- Coverage: Do results answer the question completely?
- Diversity: Do results cover different aspects of the topic?
The response_format_callback parameter accepts a function that customizes how search results are formatted before being sent to the LLM:
class CustomAgent(AgentBase):
def __init__(self):
super().__init__(name="CustomAgent")
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search documentation",
"index_path": "./docs.swsearch",
"response_format_callback": self._format_search_results
})
def _format_search_results(self, response, agent, query, results, **kwargs):
"""Custom formatter for search results"""
if not results:
return response # Use default no_results_message
# Add custom instructions
formatted = "Documentation Search Results:\n\n"
formatted += f"Query: {query}\n"
formatted += f"Found {len(results)} relevant sections:\n\n"
# Include the default-formatted response
formatted += response
# Add footer instructions for the LLM
formatted += "\n\nBased on these results, provide a clear and accurate answer."
return formattedCallback parameters:
| Parameter | Type | Description |
|---|---|---|
response |
string | The default formatted response |
agent |
AgentBase | The agent instance |
query |
string | The search query |
results |
list[dict] | List of result dictionaries with content and metadata keys |
**kwargs |
dict | Additional metadata (may include truncated, start_time, etc.) |
For agents that handle both voice and chat, use the callback to adapt formatting:
class MultiModalAgent(AgentBase):
def __init__(self):
super().__init__(name="MultiModalAgent")
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search documentation",
"index_path": "./docs.swsearch",
"response_format_callback": self._adaptive_format
})
def _adaptive_format(self, response, agent, query, results, **kwargs):
"""Format differently for voice vs chat"""
if not results:
return response
is_voice = getattr(agent, 'is_voice', False)
if is_voice:
instructions = (
"Voice Mode:\n"
"Use these search results to answer naturally. "
"Do not read URLs or code verbatim. "
"Summarize technical concepts clearly. "
"Mention that detailed documentation is available online.\n\n"
)
else:
instructions = (
"Chat Mode:\n"
"Use these search results to answer. "
"Include relevant URLs from results. "
"Format code with markdown code blocks. "
"Provide comprehensive technical details.\n\n"
)
return instructions + responseUse the callback to log search performance metrics:
import time
class MonitoredAgent(AgentBase):
def __init__(self):
super().__init__(name="MonitoredAgent")
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search documentation",
"index_path": "./docs.swsearch",
"response_format_callback": self._monitored_format
})
def _monitored_format(self, response, agent, query, results, **kwargs):
"""Monitor search performance"""
start_time = kwargs.get('start_time', time.time())
search_time = time.time() - start_time
logger.info(f"Search query: {query}")
logger.info(f"Results: {len(results)}")
logger.info(f"Search time: {search_time:.3f}s")
return responseAn agent can have multiple native_vector_search skill instances, each pointing to a different index or collection. The LLM selects which function to invoke based on the description of each tool.
class SupportAgent(AgentBase):
def __init__(self):
super().__init__(name="SupportAgent")
# General documentation
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search general product documentation",
"index_path": "./docs.swsearch",
"count": 5
})
# API reference
self.add_skill("native_vector_search", {
"tool_name": "search_api",
"description": "Search API documentation for endpoints, parameters, and examples",
"index_path": "./api.swsearch",
"tags": ["api"],
"count": 3
})
# Troubleshooting guide
self.add_skill("native_vector_search", {
"tool_name": "search_troubleshooting",
"description": "Search troubleshooting guides for error messages and solutions",
"index_path": "./troubleshooting.swsearch",
"tags": ["troubleshooting", "errors"],
"count": 3
})Combine tag filtering with custom formatters to create specialized search tools from a single index:
class SpecializedAgent(AgentBase):
def __init__(self):
super().__init__(name="SpecializedAgent")
# Code examples only
self.add_skill("native_vector_search", {
"tool_name": "find_code_examples",
"description": "Find code examples and implementation samples",
"index_path": "./docs.swsearch",
"tags": ["code", "example"],
"response_format_callback": self._format_code_examples
})
# Error solutions only
self.add_skill("native_vector_search", {
"tool_name": "find_error_solutions",
"description": "Find solutions to error messages and problems",
"index_path": "./docs.swsearch",
"tags": ["troubleshooting", "errors", "solutions"]
})
# Beginner tutorials only
self.add_skill("native_vector_search", {
"tool_name": "find_tutorials",
"description": "Find beginner-friendly tutorials and guides",
"index_path": "./docs.swsearch",
"tags": ["tutorial", "beginner", "guide"]
})
def _format_code_examples(self, response, agent, query, results, **kwargs):
"""Format code examples with language annotations"""
formatted = "Code Examples Found:\n\n"
for result in results:
languages = result.get('metadata', {}).get('code_languages', [])
if languages:
formatted += f"Languages: {', '.join(languages)}\n"
formatted += response
return formattedWhen using multiple search tools, instruct the agent on which tool to use for different query types:
class SmartAgent(AgentBase):
def __init__(self):
super().__init__(name="SmartAgent")
self.add_skill("native_vector_search", {
"tool_name": "search_knowledge",
"description": "Search our knowledge base",
"index_path": "./knowledge.swsearch"
})
# Instruct agent on search usage
self.prompt_add_section(
"Using Search",
bullets=[
"ALWAYS search the knowledge base before answering technical questions",
"Use search_knowledge for questions about features, APIs, or how-to topics",
"Base your answers on search results, not general knowledge",
"If search returns no results, tell the user you don't have that information",
"Don't make up answers - search first, then respond based on results"
]
)Sigmond is SignalWire's production demo agent -- a multi-collection AI assistant that answers questions about SignalWire products, pricing, and FreeSWITCH telephony. It demonstrates how the search system scales from single-agent development to production multi-agent deployment.
Sigmond uses three separate knowledge bases stored in a single PostgreSQL database with the pgvector extension:
PostgreSQL (pgvector)
+-- signalwire_unified (5,000+ chunks)
| +-- SDK documentation
| +-- Developer guides
| +-- API references
| +-- Platform features
+-- pricing (500+ chunks)
| +-- Pricing pages
| +-- Plan comparisons
| +-- Billing information
+-- freeswitch (2,000+ chunks)
+-- FreeSWITCH documentation
+-- Telephony concepts
+-- SIP configuration
Each collection uses a chunking strategy optimized for its content type:
SignalWire unified documentation -- built with the markdown strategy to preserve code blocks and header hierarchy:
sw-search \
./signalwire-docs \
./sdk-docs \
./api-docs \
--chunking-strategy markdown \
--model mini \
--backend pgvector \
--connection-string "$PGVECTOR_CONNECTION" \
--collection-name signalwire_unified \
--tags documentation,signalwire,api,sdkPricing collection -- built with the JSON strategy for precise control over chunk boundaries (each plan, feature, or price point as its own chunk):
sw-search \
./pricing.json \
--chunking-strategy json \
--model mini \
--backend pgvector \
--connection-string "$PGVECTOR_CONNECTION" \
--collection-name pricing \
--tags pricing,plans,costsFreeSWITCH documentation -- built with markdown strategy for technical content:
sw-search \
./freeswitch-docs \
--chunking-strategy markdown \
--model mini \
--backend pgvector \
--connection-string "$PGVECTOR_CONNECTION" \
--collection-name freeswitch \
--tags freeswitch,telephony,sipAgent configuration with all three search skills:
class SigmondAgent(AgentBase):
def __init__(self):
super().__init__(
name="Sigmond",
route="/sigmond",
port=3000
)
# Build pgvector connection
pg_user = os.getenv("PGVECTOR_DB_USER", "signalwire")
pg_pass = os.getenv("PGVECTOR_DB_PASSWORD")
pg_host = os.getenv("PGVECTOR_HOST", "localhost")
pg_port = os.getenv("PGVECTOR_PORT", "5432")
pg_db = os.getenv("PGVECTOR_DB_NAME", "knowledge")
connection_string = f"postgresql://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}"
self._add_search_skills(connection_string)
def _add_search_skills(self, connection_string):
"""Add three specialized search skills"""
# 1. SignalWire unified documentation
self.add_skill("native_vector_search", {
"tool_name": "search_signalwire_knowledge",
"description": "Search all SignalWire knowledge including SDK documentation, developer docs, API references, and general platform information",
"backend": "pgvector",
"connection_string": connection_string,
"collection_name": "signalwire_unified",
"model_name": "mini",
"count": 5,
"distance_threshold": 0.4,
"response_format_callback": self._format_search_results,
"no_results_message": "I couldn't find information about '{query}' in the SignalWire knowledge base.",
"swaig_fields": {
"fillers": {
"en-US": [
"Let me search the SignalWire knowledge base...",
"I'm looking through the documentation...",
"Searching for SignalWire information...",
"Let me check the technical documentation..."
]
}
}
})
# 2. Pricing information
self.add_skill("native_vector_search", {
"tool_name": "search_pricing",
"description": "Search for SignalWire pricing information, plans, costs, and billing details",
"backend": "pgvector",
"connection_string": connection_string,
"collection_name": "pricing",
"model_name": "mini",
"count": 3,
"distance_threshold": 0.4,
"response_format_callback": self._format_search_results,
"no_results_message": "I couldn't find specific pricing information for '{query}'. Please check signalwire.com/pricing or contact sales@signalwire.com.",
"swaig_fields": {
"fillers": {
"en-US": [
"Let me check the pricing information...",
"Looking up pricing details...",
"Searching pricing data..."
]
}
}
})
# 3. FreeSWITCH documentation
self.add_skill("native_vector_search", {
"tool_name": "search_freeswitch_knowledge",
"description": "Search for knowledge about FreeSWITCH telephony system",
"backend": "pgvector",
"connection_string": connection_string,
"collection_name": "freeswitch",
"model_name": "mini",
"count": 3,
"distance_threshold": 0.4,
"response_format_callback": self._format_search_results,
"no_results_message": "I couldn't find information about '{query}' in the FreeSWITCH documentation.",
"swaig_fields": {
"fillers": {
"en-US": [
"Let me search the FreeSWITCH documentation...",
"Looking through FreeSWITCH knowledge...",
"Searching FreeSWITCH information..."
]
}
}
})
def _format_search_results(self, response, agent, query, results, **kwargs):
"""Custom formatter that adapts to voice vs chat mode"""
if not results:
return response # Use default no_results_message
is_voice = getattr(agent, 'is_voice', False)
if is_voice:
instructions = (
"Voice Mode Instructions:\n"
"Use the following search results to answer the user's question. "
"Since this is a voice conversation:\n"
"- Provide a natural, conversational response\n"
"- Do not read URLs or code snippets verbatim\n"
"- Summarize technical concepts clearly\n"
"- Mention that code examples and links are available in the developer docs\n"
"- Keep responses concise and easy to follow by ear\n"
"- If there is not enough info in the response, try searching the web\n\n"
)
else:
instructions = (
"Chat Mode Instructions:\n"
"Use the following search results to answer the user's question. "
"Since this is a text chat:\n"
"- Include relevant URLs from the results in your response\n"
"- Format all code examples with markdown code blocks\n"
"- Scrape any relevant URLs to get more detailed information\n"
"- Provide comprehensive technical details when appropriate\n"
"- Use markdown formatting for better readability\n"
"- If there is not enough info in the response, try searching the web\n\n"
)
return instructions + responseSigmond's prompt explicitly maps query types to search tools:
self.prompt_add_section(
"Using Your Tools",
body="Match the right tool to each question:",
bullets=[
"SignalWire technical/SDK/API questions -> search_signalwire_knowledge",
"Pricing/costs questions -> search_pricing",
"FreeSWITCH/telephony questions -> search_freeswitch_knowledge",
"Current events/general info -> web_search",
"Specific URLs -> scrape_url or crawl_site",
"ALWAYS search before answering technical questions."
]
)
self.prompt_add_section(
"Your Mission",
bullets=[
"For SignalWire questions: ALWAYS search_signalwire_knowledge first, then answer.",
"For pricing: ALWAYS search_pricing first. Mention transparent developer pricing and sales@signalwire.com.",
"You showcase the AI Kernel - fast, native infrastructure without latency."
]
)This explicit mapping increased search tool usage from 60% to 95% of queries.
Query performance (production):
| Collection | Chunk Count | Avg Query Time |
|---|---|---|
| signalwire_unified | 5,000+ | ~30ms |
| pricing | 500+ | ~10ms |
| freeswitch | 2,000+ | ~20ms |
Success rates:
- 95% of technical questions answered from search
- 98% of pricing questions answered from search
- 5% fallback to web search for current information
Deployment configuration:
All pods share the same pgvector database, eliminating index duplication:
Kubernetes Deployment
+-- sigmond-pod-1 --+
+-- sigmond-pod-2 --+--> PostgreSQL (pgvector)
+-- sigmond-pod-3 --+
+-- sigmond-pod-4 --+
Production Dockerfile using the lightweight search-queryonly installation (pods do not need ML models -- they query pre-built indexes):
FROM python:3.11-slim
RUN pip install signalwire-sdk[search-queryonly]
COPY sigmond.py /app/
CMD ["python", "/app/sigmond.py"]Environment variables:
export PGVECTOR_DB_USER=signalwire
export PGVECTOR_DB_PASSWORD=<secure-password>
export PGVECTOR_DB_NAME=sigmond_knowledge
export PGVECTOR_HOST=postgres.production.local
export PGVECTOR_PORT=5432Lessons learned:
-
Multiple collections outperform a single index. A single large index mixed SDK docs with pricing and FreeSWITCH content, producing confused results. Separate collections allow the LLM to choose the right search and return focused results.
-
Markdown strategy is essential for technical documentation. Sentence chunking split code blocks mid-code and lost header context. The markdown strategy keeps code blocks intact, preserves header hierarchy, and adds automatic "code" tags that boost code example results.
-
Voice and chat require different formatting. Without adaptive formatting, voice mode read URLs aloud and chat mode lacked links and code blocks. The
response_format_callbacksolves this cleanly. -
Explicit prompt instructions drive search usage. Vague instructions like "You have search functions" resulted in only 60% search usage. Explicit tool-to-query-type mapping raised it to 95%.
-
The mini embedding model is sufficient for most use cases. Testing mini vs base showed approximately 2% quality difference but 2x speed improvement and 50% smaller index size.
-
A distance_threshold of 0.4 is the production sweet spot for technical documentation. Testing across 0.3 to 0.6 showed 0.3 was too strict (frequent zero results), 0.5 included some irrelevant results, and 0.6 was too permissive.
-
Rolling updates enable zero-downtime collection changes. Build the new collection under a new name, test it, switch agents via environment variable, then delete the old collection.
Cost analysis (pgvector):
- Database hosting: ~$100/month for 20GB
- Supports 10+ concurrent agents, 1,000+ queries/day
- More cost-effective than managed vector database alternatives at this scale
Scalability path:
- Add read replicas for increased throughput
- Partition collections for millions of chunks
- Connection pooling handles 100+ agents
For more information on deployment patterns, see search_deployment.md. For troubleshooting production issues, see search_troubleshooting.md.
Use swaig-test to verify search works before deployment:
# List available tools
swaig-test agent.py --list-tools
# Test search function
swaig-test agent.py --exec search_docs --query "how to create an agent"Verify the index directly:
sw-search search ./knowledge.swsearch "test query"For comprehensive testing approaches including A/B testing, see the Tuning Search Quality section.
examples/search_with_custom_formatter.py- Custom response formatter callback for search resultsexamples/sigmond_simple.py- Simple agent with local.swsearchfile-based knowledge searchexamples/sigmond_native_search.py- Native vector search skill with local search indexexamples/pgvector_search_agent.py- PGVector backend for document search