Bridge multiple enterprise data sources via Model Context Protocol (MCP) for intelligent context retrieval with LLM-powered understanding and ranking.
Context Bridge is a Python library for intelligent context retrieval from enterprise data sources. It bridges multiple systems (Jira, Confluence, GitHub) via Model Context Protocol (MCP) servers, providing LLM-powered query analysis, source routing, and relevance ranking.
Most RAG implementations retrieve from vector stores of pre-indexed documents. This works for static knowledge bases but falls short for enterprise data that lives in dynamic systems like Jira and Confluence, where content changes daily and relationships between items matter.
Context Bridge takes a different approach:
-
Live enterprise retrieval - Queries actual enterprise APIs (via MCP servers) rather than stale vector indices. Results reflect current state, not last-indexed state.
-
Agentic search pipeline - Multiple LLM calls work together: one analyzes the query, another routes to appropriate sources, another translates to native query languages (JQL/CQL), and another ranks results by relevance. Each step can be evaluated and improved independently.
-
Structured source data - Enterprise tools have rich metadata (issue status, page hierarchy, assignees, labels). The system leverages this structure rather than treating everything as flat text chunks.
-
Grounded retrieval - Returns actual content with source links. No synthesis or generation at the retrieval layer - that's left to downstream consumers who can reason over the context.
The architecture is designed for iterative improvement: routing decisions can be evaluated against ground truth, ranking quality can be measured, and feedback loops can tune the system over time.
- Intelligent Query Understanding: LLM analyzes queries to understand intent and expand keywords
- Smart Source Routing: Automatically determines which sources (Jira, Confluence) to query
- Query Translation: Converts natural language to JQL/CQL automatically
- Parallel Execution: Queries multiple sources concurrently for speed
- LLM-Based Ranking: Results ranked by relevance using LLM
- Feedback Learning: System improves over time based on user feedback
- Configurable LLM: Supports Anthropic Claude and OpenAI GPT
# Using uv (recommended)
uv add context-bridge
# Or using pip
pip install context-bridgeCreate a config.yaml file:
llm:
provider: anthropic # or "openai"
model: claude-3-5-sonnet-20241022
api_key: ${ANTHROPIC_API_KEY} # Use environment variable
temperature: 0.1
mcp_sources:
servers:
atlassian:
command: npx
args:
- "-y"
- "@modelcontextprotocol/server-atlassian"
env:
JIRA_URL: "https://your-company.atlassian.net"
JIRA_EMAIL: "${JIRA_EMAIL}"
JIRA_API_TOKEN: "${JIRA_API_TOKEN}"
CONFLUENCE_URL: "https://your-company.atlassian.net/wiki"
max_results_per_source: 50
search:
top_k_results: 20
enable_query_expansion: true
feedback:
database_path: "~/.context_bridge/feedback.db"
enable_feedback: trueimport asyncio
from context_bridge import ContextBridge
async def main():
# Load from config file
searcher = ContextBridge.from_yaml("config.yaml")
# Execute search
response = await searcher.search(
"How do we handle user authentication?"
)
# Display results
print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
for result in response.results:
print(f"\n{result.title}")
print(f"Source: {result.source_type.value}")
print(f"URL: {result.source_url}")
print(f"Relevance: {result.relevance_score:.2f}")
print(f"Content: {result.content[:200]}...")
# Submit feedback
searcher.submit_feedback(
search_id=response.search_id,
result_id=response.results[0].id,
feedback="positive",
comment="Very helpful!"
)
if __name__ == "__main__":
asyncio.run(main())For more comprehensive results, use deep_search() which provides:
- Search modes: Choose between recall (broad), precision (focused), or balanced
- Token tracking: See exactly how many tokens and cost for each search
- Progressive fetching: Get quick initial results, then fetch more with expansion
- Content expansion: Includes comments, subtasks, linked issues, and child pages
import asyncio
from context_bridge import ContextBridge
async def main():
searcher = ContextBridge.from_yaml("config.yaml")
# Deep search with recall mode for broader results
response = await searcher.deep_search(
"authentication implementation details",
search_mode="recall", # or "precision" or "balanced"
)
# Display initial results with token usage
print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
print(f"Token usage: {response.total_tokens} tokens (in: {response.total_input_tokens}, out: {response.total_output_tokens})")
for result in response.results[:5]:
print(f"\n{result.title}")
print(f" Relevance: {result.relevance_score:.2f}")
# Fetch more results with content expansion
if response.has_more:
print("\nFetching expanded content...")
more_results = await response.fetch_more()
print(f"Found {len(more_results)} additional results with expanded content")
for result in more_results[:3]:
print(f"\n{result.title}")
print(f" Type: {result.metadata.get('type', 'main')}")
# Feedback works the same way
searcher.submit_feedback(
search_id=response.search_id,
result_id=response.results[0].id,
feedback="positive",
)
if __name__ == "__main__":
asyncio.run(main())fetch_more() now includes powerful content expansion capabilities:
- 📎 Attachment Processing: Automatically downloads and extracts text from PDFs and documents
- 🖼️ Vision Analysis: Analyzes images using Claude/GPT-4V vision models
- 📊 Epic Details: Fetches related Epic information for Jira issues
- 🔗 Confluence Links: Follows and fetches Confluence pages linked from Jira
- 🧠 Smart Content Fetching: LLM decides which results need full content (not just previews)
- ⚡ Parallel Processing: All operations run in parallel for maximum speed
- 🛡️ Safety Features: Timeout protection, token budgets, and graceful error handling
All of these features are configurable in your deep_search configuration section. See config.example.yaml for all available options including:
- Token limits and warnings
- Attachment size limits
- PDF extraction settings
- Vision analysis toggle
- Timeout configuration
| Mode | Use When | Keyword Count | Best For |
|---|---|---|---|
| recall | You want comprehensive coverage | 15 keywords | Exploratory searches, research |
| precision | You know exactly what you need | 5 keywords | Specific queries, known items |
| balanced | General use (default) | 10 keywords | Most searches |
| Search Type | Input Tokens | Output Tokens | Total Tokens | Time |
|---|---|---|---|---|
| Standard search | N/A | N/A | N/A | Fast |
| Deep search (initial) | ~2,700 | ~1,400 | ~4,100 | Fast |
| Deep search + fetch_more() | ~8,000 | ~4,200 | ~12,200 | Slower but comprehensive |
Note: Token counts vary based on query complexity and result set size. Calculate your costs based on your LLM provider's pricing (typically per million tokens).
Extract relevant context chunks from search results for tasks like brainstorming MVP features, backlog preparation, or strategic planning:
import asyncio
from context_bridge import ContextBridge
async def main():
searcher = ContextBridge.from_yaml("config.yaml")
# Quick workflow: search + extract relevant context
results = await searcher.search("help me prepare a backlog for November")
context = await searcher.extract_relevant_context(
results,
expand_content=True, # Fetches full content + attachments
)
# Display extracted context grouped by source
print(f"Found {len(context.sources)} sources with relevant content")
for source in context.sources:
print(f"\n{source.title} ({source.relevance_score:.2f})")
print(f" URL: {source.url}")
for chunk in source.chunks:
print(f" - [{chunk.chunk_type}] {chunk.text[:100]}...")
print(f" Reason: {chunk.relevance_reason}")
print(f"\nTokens: {context.total_input_tokens} in, {context.total_output_tokens} out")
if __name__ == "__main__":
asyncio.run(main())Relevant context extraction:
- Uses LLM to identify only relevant content chunks (not synthesis)
- Returns content as-is with reasoning for relevance
- Query-aware image analysis: Extracts specific portions from images (e.g., "November" from a roadmap with Oct/Nov/Dec)
- Smart text chunking: Evaluates paragraphs and sections for relevance
- Groups results by source (Jira issue or Confluence page)
- Preserves references to original sources for traceability
Why not synthesis? Context Bridge provides relevant context, letting you (or downstream LLMs) decide how to reason over it. This is more flexible than pre-synthesized answers and better suited for AI-assisted analysis workflows.
Supported Workflows:
search()→extract_relevant_context(expand_content=True)- Single step with expansiondeep_search()→fetch_more()→extract_relevant_context()- Avoids redundant expansion
Deprecated: extract_knowledge() is still available for backward compatibility but will be removed in a future version. Use extract_relevant_context() instead.
Query: "How have we implemented user notifications in the past?"
Finds:
- Confluence architecture docs
- Related Jira epics/stories
- Implementation details
Query: "Why did we choose PostgreSQL over MongoDB?"
Finds:
- Architecture Decision Records (ADRs)
- Discussion threads
- Comparative analysis docs
Query: "What features were included in the mobile app MVP?"
Finds:
- MVP planning documents
- Prioritized epics
- Launch checklists
Query: "What's blocking the payment integration feature?"
Finds:
- Blocked Jira issues
- Dependency chains
- Discussion in Confluence
Query: "How do competitors handle subscription cancellation?"
Finds:
- Competitive analysis docs
- UX research notes
- Feature comparisons
flowchart TD
A[User Query] --> B[QueryAnalyzer]
B --> C[SourceRouter]
C --> D[QueryTranslator]
D --> E[MCPExecutor]
E --> F[ResultRanker]
F --> G[SearchResponse]
G --> H[FeedbackStore]
B -.-> B1["LLM: Understand intent,\nexpand keywords"]
C -.-> C1["LLM: Route to Jira,\nConfluence, or both"]
D -.-> D1["LLM: Convert to\nJQL/CQL queries"]
E -.-> E1["MCP: Query sources\nin parallel"]
F -.-> F1["LLM: Score and rank\nby relevance"]
Pipeline stages:
- QueryAnalyzer - LLM understands intent and expands keywords
- SourceRouter - LLM determines which sources to query (Jira, Confluence, or both)
- QueryTranslator - LLM converts natural language to native query languages (JQL/CQL)
- MCPExecutor - Executes queries against MCP servers in parallel
- ResultRanker - LLM ranks results by relevance, boosted by feedback history
- FeedbackStore - Stores user feedback for continuous improvement
Agentic Search includes comprehensive logging for debugging and monitoring:
from context_bridge import setup_logging
# Development - detailed logging
setup_logging(level="DEBUG")
# Production - structured JSON logs
setup_logging(level="INFO", format_json=True)Logs are written to context_bridge.log by default. Set level="DEBUG" for detailed debugging output.
Anthropic Claude:
llm:
provider: anthropic
model: claude-3-5-sonnet-20241022
api_key: ${ANTHROPIC_API_KEY}OpenAI GPT:
llm:
provider: openai
model: gpt-4o # or gpt-4-turbo
api_key: ${OPENAI_API_KEY}Note: If enabling result evaluation (
agentic.enable_evaluation: true), also setagentic.evaluation_modelto an OpenAI model (e.g.,gpt-4o-mini). The default evaluation model is configured for Anthropic Claude.
Currently supports:
- Atlassian (Jira + Confluence):
@modelcontextprotocol/server-atlassian
To add more sources, configure additional MCP servers in mcp_sources.servers.
# Clone repository
git clone https://github.com/YOUR_ORG/context-bridge.git
cd context_bridge
# Install dependencies
uv sync
# Run tests
uv run pytest# All tests
uv run pytest
# Unit tests only
uv run pytest tests/unit/
# Integration tests
uv run pytest tests/integration/
# With coverage
uv run pytest --cov=context_bridgeThe feedback system enables continuous improvement:
# After search, submit feedback
searcher.submit_feedback(
search_id=response.search_id,
result_id=result.id,
feedback="positive", # or "negative"
comment="Exactly what I needed!" # optional
)
# View search history
recent = searcher.get_recent_searches(limit=10)
# Get specific search
search = searcher.get_search_by_id(search_id)Over time, sources with positive feedback get boosted in rankings.
- ✅ Core search with Jira & Confluence
- ✅ LLM-based query understanding
- ✅ Feedback system
- ✅ Query expansion
- Semantic search with vector embeddings
- Query caching
- GitHub MCP integration
- Internet search integration
- Result clustering
- Export to reports
- Additional MCP integrations - GitHub, Google Drive, and other enterprise sources
- Local LLM support - Integration with locally-hosted models for privacy-sensitive deployments
- Task-specific LLM routing - Use different models for different pipeline stages (e.g., fast model for routing, capable model for ranking)
- Self-evaluation loop - Automatically assess result quality and refine searches when needed
Contributions welcome! Please open an issue or pull request on GitHub.
MIT License - see LICENSE file for details.
- Documentation: See this README and inline code documentation
- Issues: Open an issue on GitHub
- Discussions: Start a discussion on GitHub
If you use this in research, please cite:
@software{context_bridge,
title = {Context Bridge: Intelligent Multi-Source Retrieval via MCP},
year = {2025},
url = {https://github.com/YOUR_ORG/context-bridge}
}