Context Bridge

Bridge multiple enterprise data sources via Model Context Protocol (MCP) for intelligent context retrieval with LLM-powered understanding and ranking.

Overview

Context Bridge is a Python library for intelligent context retrieval from enterprise data sources. It bridges multiple systems (Jira, Confluence, GitHub) via Model Context Protocol (MCP) servers, providing LLM-powered query analysis, source routing, and relevance ranking.

Why Context Bridge?

Most RAG implementations retrieve from vector stores of pre-indexed documents. This works for static knowledge bases but falls short for enterprise data that lives in dynamic systems like Jira and Confluence, where content changes daily and relationships between items matter.

Context Bridge takes a different approach:

Live enterprise retrieval - Queries actual enterprise APIs (via MCP servers) rather than stale vector indices. Results reflect current state, not last-indexed state.
Agentic search pipeline - Multiple LLM calls work together: one analyzes the query, another routes to appropriate sources, another translates to native query languages (JQL/CQL), and another ranks results by relevance. Each step can be evaluated and improved independently.
Structured source data - Enterprise tools have rich metadata (issue status, page hierarchy, assignees, labels). The system leverages this structure rather than treating everything as flat text chunks.
Grounded retrieval - Returns actual content with source links. No synthesis or generation at the retrieval layer - that's left to downstream consumers who can reason over the context.

The architecture is designed for iterative improvement: routing decisions can be evaluated against ground truth, ranking quality can be measured, and feedback loops can tune the system over time.

Features

Intelligent Query Understanding: LLM analyzes queries to understand intent and expand keywords
Smart Source Routing: Automatically determines which sources (Jira, Confluence) to query
Query Translation: Converts natural language to JQL/CQL automatically
Parallel Execution: Queries multiple sources concurrently for speed
LLM-Based Ranking: Results ranked by relevance using LLM
Feedback Learning: System improves over time based on user feedback
Configurable LLM: Supports Anthropic Claude and OpenAI GPT

Installation

# Using uv (recommended)
uv add context-bridge

# Or using pip
pip install context-bridge

Quick Start

1. Set up configuration

Create a config.yaml file:

llm:
  provider: anthropic  # or "openai"
  model: claude-3-5-sonnet-20241022
  api_key: ${ANTHROPIC_API_KEY}  # Use environment variable
  temperature: 0.1

mcp_sources:
  servers:
    atlassian:
      command: npx
      args:
        - "-y"
        - "@modelcontextprotocol/server-atlassian"
      env:
        JIRA_URL: "https://your-company.atlassian.net"
        JIRA_EMAIL: "${JIRA_EMAIL}"
        JIRA_API_TOKEN: "${JIRA_API_TOKEN}"
        CONFLUENCE_URL: "https://your-company.atlassian.net/wiki"
  max_results_per_source: 50

search:
  top_k_results: 20
  enable_query_expansion: true

feedback:
  database_path: "~/.context_bridge/feedback.db"
  enable_feedback: true

2. Basic Usage

import asyncio
from context_bridge import ContextBridge

async def main():
    # Load from config file
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Execute search
    response = await searcher.search(
        "How do we handle user authentication?"
    )
    
    # Display results
    print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
    
    for result in response.results:
        print(f"\n{result.title}")
        print(f"Source: {result.source_type.value}")
        print(f"URL: {result.source_url}")
        print(f"Relevance: {result.relevance_score:.2f}")
        print(f"Content: {result.content[:200]}...")
    
    # Submit feedback
    searcher.submit_feedback(
        search_id=response.search_id,
        result_id=response.results[0].id,
        feedback="positive",
        comment="Very helpful!"
    )

if __name__ == "__main__":
    asyncio.run(main())

3. Deep Search with Content Expansion

For more comprehensive results, use deep_search() which provides:

Search modes: Choose between recall (broad), precision (focused), or balanced
Token tracking: See exactly how many tokens and cost for each search
Progressive fetching: Get quick initial results, then fetch more with expansion
Content expansion: Includes comments, subtasks, linked issues, and child pages

import asyncio
from context_bridge import ContextBridge

async def main():
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Deep search with recall mode for broader results
    response = await searcher.deep_search(
        "authentication implementation details",
        search_mode="recall",  # or "precision" or "balanced"
    )
    
    # Display initial results with token usage
    print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
    print(f"Token usage: {response.total_tokens} tokens (in: {response.total_input_tokens}, out: {response.total_output_tokens})")
    
    for result in response.results[:5]:
        print(f"\n{result.title}")
        print(f"  Relevance: {result.relevance_score:.2f}")
    
    # Fetch more results with content expansion
    if response.has_more:
        print("\nFetching expanded content...")
        more_results = await response.fetch_more()
        print(f"Found {len(more_results)} additional results with expanded content")
        
        for result in more_results[:3]:
            print(f"\n{result.title}")
            print(f"  Type: {result.metadata.get('type', 'main')}")
    
    # Feedback works the same way
    searcher.submit_feedback(
        search_id=response.search_id,
        result_id=response.results[0].id,
        feedback="positive",
    )

if __name__ == "__main__":
    asyncio.run(main())

Enhanced Deep Search Features

fetch_more() now includes powerful content expansion capabilities:

📎 Attachment Processing: Automatically downloads and extracts text from PDFs and documents
🖼️ Vision Analysis: Analyzes images using Claude/GPT-4V vision models
📊 Epic Details: Fetches related Epic information for Jira issues
🔗 Confluence Links: Follows and fetches Confluence pages linked from Jira
🧠 Smart Content Fetching: LLM decides which results need full content (not just previews)
⚡ Parallel Processing: All operations run in parallel for maximum speed
🛡️ Safety Features: Timeout protection, token budgets, and graceful error handling

All of these features are configurable in your deep_search configuration section. See config.example.yaml for all available options including:

Token limits and warnings
Attachment size limits
PDF extraction settings
Vision analysis toggle
Timeout configuration

Search Modes Comparison

Mode	Use When	Keyword Count	Best For
recall	You want comprehensive coverage	15 keywords	Exploratory searches, research
precision	You know exactly what you need	5 keywords	Specific queries, known items
balanced	General use (default)	10 keywords	Most searches

Token Usage Comparison

Search Type	Input Tokens	Output Tokens	Total Tokens	Time
Standard search	N/A	N/A	N/A	Fast
Deep search (initial)	~2,700	~1,400	~4,100	Fast
Deep search + fetch_more()	~8,000	~4,200	~12,200	Slower but comprehensive

Note: Token counts vary based on query complexity and result set size. Calculate your costs based on your LLM provider's pricing (typically per million tokens).

4. Relevant Context Extraction (AI-Assisted Analysis)

Extract relevant context chunks from search results for tasks like brainstorming MVP features, backlog preparation, or strategic planning:

import asyncio
from context_bridge import ContextBridge

async def main():
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Quick workflow: search + extract relevant context
    results = await searcher.search("help me prepare a backlog for November")
    context = await searcher.extract_relevant_context(
        results,
        expand_content=True,  # Fetches full content + attachments
    )
    
    # Display extracted context grouped by source
    print(f"Found {len(context.sources)} sources with relevant content")
    for source in context.sources:
        print(f"\n{source.title} ({source.relevance_score:.2f})")
        print(f"  URL: {source.url}")
        for chunk in source.chunks:
            print(f"  - [{chunk.chunk_type}] {chunk.text[:100]}...")
            print(f"    Reason: {chunk.relevance_reason}")
    
    print(f"\nTokens: {context.total_input_tokens} in, {context.total_output_tokens} out")

if __name__ == "__main__":
    asyncio.run(main())

Relevant context extraction:

Uses LLM to identify only relevant content chunks (not synthesis)
Returns content as-is with reasoning for relevance
Query-aware image analysis: Extracts specific portions from images (e.g., "November" from a roadmap with Oct/Nov/Dec)
Smart text chunking: Evaluates paragraphs and sections for relevance
Groups results by source (Jira issue or Confluence page)
Preserves references to original sources for traceability

Why not synthesis? Context Bridge provides relevant context, letting you (or downstream LLMs) decide how to reason over it. This is more flexible than pre-synthesized answers and better suited for AI-assisted analysis workflows.

Supported Workflows:

search() → extract_relevant_context(expand_content=True) - Single step with expansion
deep_search() → fetch_more() → extract_relevant_context() - Avoids redundant expansion

Deprecated: extract_knowledge() is still available for backward compatibility but will be removed in a future version. Use extract_relevant_context() instead.

Use Cases

1. Feature Discovery

Query: "How have we implemented user notifications in the past?"

Finds:

Confluence architecture docs
Related Jira epics/stories
Implementation details

2. Decision Context

Query: "Why did we choose PostgreSQL over MongoDB?"

Finds:

Architecture Decision Records (ADRs)
Discussion threads
Comparative analysis docs

3. MVP Scoping

Query: "What features were included in the mobile app MVP?"

Finds:

MVP planning documents
Prioritized epics
Launch checklists

4. Blocker Investigation

Query: "What's blocking the payment integration feature?"

Finds:

Blocked Jira issues
Dependency chains
Discussion in Confluence

5. Competitor Analysis

Query: "How do competitors handle subscription cancellation?"

Finds:

Competitive analysis docs
UX research notes
Feature comparisons

Architecture

flowchart TD
    A[User Query] --> B[QueryAnalyzer]
    B --> C[SourceRouter]
    C --> D[QueryTranslator]
    D --> E[MCPExecutor]
    E --> F[ResultRanker]
    F --> G[SearchResponse]
    G --> H[FeedbackStore]
    
    B -.-> B1["LLM: Understand intent,\nexpand keywords"]
    C -.-> C1["LLM: Route to Jira,\nConfluence, or both"]
    D -.-> D1["LLM: Convert to\nJQL/CQL queries"]
    E -.-> E1["MCP: Query sources\nin parallel"]
    F -.-> F1["LLM: Score and rank\nby relevance"]

Pipeline stages:

QueryAnalyzer - LLM understands intent and expands keywords
SourceRouter - LLM determines which sources to query (Jira, Confluence, or both)
QueryTranslator - LLM converts natural language to native query languages (JQL/CQL)
MCPExecutor - Executes queries against MCP servers in parallel
ResultRanker - LLM ranks results by relevance, boosted by feedback history
FeedbackStore - Stores user feedback for continuous improvement

Logging & Debugging

Agentic Search includes comprehensive logging for debugging and monitoring:

from context_bridge import setup_logging

# Development - detailed logging
setup_logging(level="DEBUG")

# Production - structured JSON logs
setup_logging(level="INFO", format_json=True)

Logs are written to context_bridge.log by default. Set level="DEBUG" for detailed debugging output.

Configuration

LLM Providers

Anthropic Claude:

llm:
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  api_key: ${ANTHROPIC_API_KEY}

OpenAI GPT:

llm:
  provider: openai
  model: gpt-4o  # or gpt-4-turbo
  api_key: ${OPENAI_API_KEY}

Note: If enabling result evaluation (agentic.enable_evaluation: true), also set agentic.evaluation_model to an OpenAI model (e.g., gpt-4o-mini). The default evaluation model is configured for Anthropic Claude.

MCP Servers

Currently supports:

Atlassian (Jira + Confluence): @modelcontextprotocol/server-atlassian

To add more sources, configure additional MCP servers in mcp_sources.servers.

Development

Setup

# Clone repository
git clone https://github.com/YOUR_ORG/context-bridge.git
cd context_bridge

# Install dependencies
uv sync

# Run tests
uv run pytest

Running Tests

# All tests
uv run pytest

# Unit tests only
uv run pytest tests/unit/

# Integration tests
uv run pytest tests/integration/

# With coverage
uv run pytest --cov=context_bridge

Feedback System

The feedback system enables continuous improvement:

# After search, submit feedback
searcher.submit_feedback(
    search_id=response.search_id,
    result_id=result.id,
    feedback="positive",  # or "negative"
    comment="Exactly what I needed!"  # optional
)

# View search history
recent = searcher.get_recent_searches(limit=10)

# Get specific search
search = searcher.get_search_by_id(search_id)

Over time, sources with positive feedback get boosted in rankings.

Roadmap

Phase 1 (Current - MVP)

✅ Core search with Jira & Confluence
✅ LLM-based query understanding
✅ Feedback system
✅ Query expansion

Phase 2 (Planned)

Future Directions

Additional MCP integrations - GitHub, Google Drive, and other enterprise sources
Local LLM support - Integration with locally-hosted models for privacy-sensitive deployments
Task-specific LLM routing - Use different models for different pipeline stages (e.g., fast model for routing, capable model for ranking)
Self-evaluation loop - Automatically assess result quality and refine searches when needed

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

License

MIT License - see LICENSE file for details.

Support

Documentation: See this README and inline code documentation
Issues: Open an issue on GitHub
Discussions: Start a discussion on GitHub

Citation

If you use this in research, please cite:

@software{context_bridge,
  title = {Context Bridge: Intelligent Multi-Source Retrieval via MCP},
  year = {2025},
  url = {https://github.com/YOUR_ORG/context-bridge}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
context_bridge		context_bridge
evals		evals
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
HOW_TO_USE_FILTERS.md		HOW_TO_USE_FILTERS.md
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
example.py		example.py
main.py		main.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Context Bridge

Overview

Why Context Bridge?

Features

Installation

Quick Start

1. Set up configuration

2. Basic Usage

3. Deep Search with Content Expansion

Enhanced Deep Search Features

Search Modes Comparison

Token Usage Comparison

4. Relevant Context Extraction (AI-Assisted Analysis)

Use Cases

1. Feature Discovery

2. Decision Context

3. MVP Scoping

4. Blocker Investigation

5. Competitor Analysis

Architecture

Logging & Debugging

Configuration

LLM Providers

MCP Servers

Development

Setup

Running Tests

Feedback System

Roadmap

Phase 1 (Current - MVP)

Phase 2 (Planned)

Future Directions

Contributing

License

Support

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages