Skip to content

karrtikiyer/context_bridge

Repository files navigation

Context Bridge

Bridge multiple enterprise data sources via Model Context Protocol (MCP) for intelligent context retrieval with LLM-powered understanding and ranking.

Overview

Context Bridge is a Python library for intelligent context retrieval from enterprise data sources. It bridges multiple systems (Jira, Confluence, GitHub) via Model Context Protocol (MCP) servers, providing LLM-powered query analysis, source routing, and relevance ranking.

Why Context Bridge?

Most RAG implementations retrieve from vector stores of pre-indexed documents. This works for static knowledge bases but falls short for enterprise data that lives in dynamic systems like Jira and Confluence, where content changes daily and relationships between items matter.

Context Bridge takes a different approach:

  1. Live enterprise retrieval - Queries actual enterprise APIs (via MCP servers) rather than stale vector indices. Results reflect current state, not last-indexed state.

  2. Agentic search pipeline - Multiple LLM calls work together: one analyzes the query, another routes to appropriate sources, another translates to native query languages (JQL/CQL), and another ranks results by relevance. Each step can be evaluated and improved independently.

  3. Structured source data - Enterprise tools have rich metadata (issue status, page hierarchy, assignees, labels). The system leverages this structure rather than treating everything as flat text chunks.

  4. Grounded retrieval - Returns actual content with source links. No synthesis or generation at the retrieval layer - that's left to downstream consumers who can reason over the context.

The architecture is designed for iterative improvement: routing decisions can be evaluated against ground truth, ranking quality can be measured, and feedback loops can tune the system over time.

Features

  • Intelligent Query Understanding: LLM analyzes queries to understand intent and expand keywords
  • Smart Source Routing: Automatically determines which sources (Jira, Confluence) to query
  • Query Translation: Converts natural language to JQL/CQL automatically
  • Parallel Execution: Queries multiple sources concurrently for speed
  • LLM-Based Ranking: Results ranked by relevance using LLM
  • Feedback Learning: System improves over time based on user feedback
  • Configurable LLM: Supports Anthropic Claude and OpenAI GPT

Installation

# Using uv (recommended)
uv add context-bridge

# Or using pip
pip install context-bridge

Quick Start

1. Set up configuration

Create a config.yaml file:

llm:
  provider: anthropic  # or "openai"
  model: claude-3-5-sonnet-20241022
  api_key: ${ANTHROPIC_API_KEY}  # Use environment variable
  temperature: 0.1

mcp_sources:
  servers:
    atlassian:
      command: npx
      args:
        - "-y"
        - "@modelcontextprotocol/server-atlassian"
      env:
        JIRA_URL: "https://your-company.atlassian.net"
        JIRA_EMAIL: "${JIRA_EMAIL}"
        JIRA_API_TOKEN: "${JIRA_API_TOKEN}"
        CONFLUENCE_URL: "https://your-company.atlassian.net/wiki"
  max_results_per_source: 50

search:
  top_k_results: 20
  enable_query_expansion: true

feedback:
  database_path: "~/.context_bridge/feedback.db"
  enable_feedback: true

2. Basic Usage

import asyncio
from context_bridge import ContextBridge

async def main():
    # Load from config file
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Execute search
    response = await searcher.search(
        "How do we handle user authentication?"
    )
    
    # Display results
    print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
    
    for result in response.results:
        print(f"\n{result.title}")
        print(f"Source: {result.source_type.value}")
        print(f"URL: {result.source_url}")
        print(f"Relevance: {result.relevance_score:.2f}")
        print(f"Content: {result.content[:200]}...")
    
    # Submit feedback
    searcher.submit_feedback(
        search_id=response.search_id,
        result_id=response.results[0].id,
        feedback="positive",
        comment="Very helpful!"
    )

if __name__ == "__main__":
    asyncio.run(main())

3. Deep Search with Content Expansion

For more comprehensive results, use deep_search() which provides:

  • Search modes: Choose between recall (broad), precision (focused), or balanced
  • Token tracking: See exactly how many tokens and cost for each search
  • Progressive fetching: Get quick initial results, then fetch more with expansion
  • Content expansion: Includes comments, subtasks, linked issues, and child pages
import asyncio
from context_bridge import ContextBridge

async def main():
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Deep search with recall mode for broader results
    response = await searcher.deep_search(
        "authentication implementation details",
        search_mode="recall",  # or "precision" or "balanced"
    )
    
    # Display initial results with token usage
    print(f"Found {len(response.results)} results in {response.execution_time_ms}ms")
    print(f"Token usage: {response.total_tokens} tokens (in: {response.total_input_tokens}, out: {response.total_output_tokens})")
    
    for result in response.results[:5]:
        print(f"\n{result.title}")
        print(f"  Relevance: {result.relevance_score:.2f}")
    
    # Fetch more results with content expansion
    if response.has_more:
        print("\nFetching expanded content...")
        more_results = await response.fetch_more()
        print(f"Found {len(more_results)} additional results with expanded content")
        
        for result in more_results[:3]:
            print(f"\n{result.title}")
            print(f"  Type: {result.metadata.get('type', 'main')}")
    
    # Feedback works the same way
    searcher.submit_feedback(
        search_id=response.search_id,
        result_id=response.results[0].id,
        feedback="positive",
    )

if __name__ == "__main__":
    asyncio.run(main())

Enhanced Deep Search Features

fetch_more() now includes powerful content expansion capabilities:

  • 📎 Attachment Processing: Automatically downloads and extracts text from PDFs and documents
  • 🖼️ Vision Analysis: Analyzes images using Claude/GPT-4V vision models
  • 📊 Epic Details: Fetches related Epic information for Jira issues
  • 🔗 Confluence Links: Follows and fetches Confluence pages linked from Jira
  • 🧠 Smart Content Fetching: LLM decides which results need full content (not just previews)
  • ⚡ Parallel Processing: All operations run in parallel for maximum speed
  • 🛡️ Safety Features: Timeout protection, token budgets, and graceful error handling

All of these features are configurable in your deep_search configuration section. See config.example.yaml for all available options including:

  • Token limits and warnings
  • Attachment size limits
  • PDF extraction settings
  • Vision analysis toggle
  • Timeout configuration

Search Modes Comparison

Mode Use When Keyword Count Best For
recall You want comprehensive coverage 15 keywords Exploratory searches, research
precision You know exactly what you need 5 keywords Specific queries, known items
balanced General use (default) 10 keywords Most searches

Token Usage Comparison

Search Type Input Tokens Output Tokens Total Tokens Time
Standard search N/A N/A N/A Fast
Deep search (initial) ~2,700 ~1,400 ~4,100 Fast
Deep search + fetch_more() ~8,000 ~4,200 ~12,200 Slower but comprehensive

Note: Token counts vary based on query complexity and result set size. Calculate your costs based on your LLM provider's pricing (typically per million tokens).

4. Relevant Context Extraction (AI-Assisted Analysis)

Extract relevant context chunks from search results for tasks like brainstorming MVP features, backlog preparation, or strategic planning:

import asyncio
from context_bridge import ContextBridge

async def main():
    searcher = ContextBridge.from_yaml("config.yaml")
    
    # Quick workflow: search + extract relevant context
    results = await searcher.search("help me prepare a backlog for November")
    context = await searcher.extract_relevant_context(
        results,
        expand_content=True,  # Fetches full content + attachments
    )
    
    # Display extracted context grouped by source
    print(f"Found {len(context.sources)} sources with relevant content")
    for source in context.sources:
        print(f"\n{source.title} ({source.relevance_score:.2f})")
        print(f"  URL: {source.url}")
        for chunk in source.chunks:
            print(f"  - [{chunk.chunk_type}] {chunk.text[:100]}...")
            print(f"    Reason: {chunk.relevance_reason}")
    
    print(f"\nTokens: {context.total_input_tokens} in, {context.total_output_tokens} out")

if __name__ == "__main__":
    asyncio.run(main())

Relevant context extraction:

  • Uses LLM to identify only relevant content chunks (not synthesis)
  • Returns content as-is with reasoning for relevance
  • Query-aware image analysis: Extracts specific portions from images (e.g., "November" from a roadmap with Oct/Nov/Dec)
  • Smart text chunking: Evaluates paragraphs and sections for relevance
  • Groups results by source (Jira issue or Confluence page)
  • Preserves references to original sources for traceability

Why not synthesis? Context Bridge provides relevant context, letting you (or downstream LLMs) decide how to reason over it. This is more flexible than pre-synthesized answers and better suited for AI-assisted analysis workflows.

Supported Workflows:

  • search()extract_relevant_context(expand_content=True) - Single step with expansion
  • deep_search()fetch_more()extract_relevant_context() - Avoids redundant expansion

Deprecated: extract_knowledge() is still available for backward compatibility but will be removed in a future version. Use extract_relevant_context() instead.

Use Cases

1. Feature Discovery

Query: "How have we implemented user notifications in the past?"

Finds:

  • Confluence architecture docs
  • Related Jira epics/stories
  • Implementation details

2. Decision Context

Query: "Why did we choose PostgreSQL over MongoDB?"

Finds:

  • Architecture Decision Records (ADRs)
  • Discussion threads
  • Comparative analysis docs

3. MVP Scoping

Query: "What features were included in the mobile app MVP?"

Finds:

  • MVP planning documents
  • Prioritized epics
  • Launch checklists

4. Blocker Investigation

Query: "What's blocking the payment integration feature?"

Finds:

  • Blocked Jira issues
  • Dependency chains
  • Discussion in Confluence

5. Competitor Analysis

Query: "How do competitors handle subscription cancellation?"

Finds:

  • Competitive analysis docs
  • UX research notes
  • Feature comparisons

Architecture

flowchart TD
    A[User Query] --> B[QueryAnalyzer]
    B --> C[SourceRouter]
    C --> D[QueryTranslator]
    D --> E[MCPExecutor]
    E --> F[ResultRanker]
    F --> G[SearchResponse]
    G --> H[FeedbackStore]
    
    B -.-> B1["LLM: Understand intent,\nexpand keywords"]
    C -.-> C1["LLM: Route to Jira,\nConfluence, or both"]
    D -.-> D1["LLM: Convert to\nJQL/CQL queries"]
    E -.-> E1["MCP: Query sources\nin parallel"]
    F -.-> F1["LLM: Score and rank\nby relevance"]
Loading

Pipeline stages:

  1. QueryAnalyzer - LLM understands intent and expands keywords
  2. SourceRouter - LLM determines which sources to query (Jira, Confluence, or both)
  3. QueryTranslator - LLM converts natural language to native query languages (JQL/CQL)
  4. MCPExecutor - Executes queries against MCP servers in parallel
  5. ResultRanker - LLM ranks results by relevance, boosted by feedback history
  6. FeedbackStore - Stores user feedback for continuous improvement

Logging & Debugging

Agentic Search includes comprehensive logging for debugging and monitoring:

from context_bridge import setup_logging

# Development - detailed logging
setup_logging(level="DEBUG")

# Production - structured JSON logs
setup_logging(level="INFO", format_json=True)

Logs are written to context_bridge.log by default. Set level="DEBUG" for detailed debugging output.

Configuration

LLM Providers

Anthropic Claude:

llm:
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  api_key: ${ANTHROPIC_API_KEY}

OpenAI GPT:

llm:
  provider: openai
  model: gpt-4o  # or gpt-4-turbo
  api_key: ${OPENAI_API_KEY}

Note: If enabling result evaluation (agentic.enable_evaluation: true), also set agentic.evaluation_model to an OpenAI model (e.g., gpt-4o-mini). The default evaluation model is configured for Anthropic Claude.

MCP Servers

Currently supports:

  • Atlassian (Jira + Confluence): @modelcontextprotocol/server-atlassian

To add more sources, configure additional MCP servers in mcp_sources.servers.

Development

Setup

# Clone repository
git clone https://github.com/YOUR_ORG/context-bridge.git
cd context_bridge

# Install dependencies
uv sync

# Run tests
uv run pytest

Running Tests

# All tests
uv run pytest

# Unit tests only
uv run pytest tests/unit/

# Integration tests
uv run pytest tests/integration/

# With coverage
uv run pytest --cov=context_bridge

Feedback System

The feedback system enables continuous improvement:

# After search, submit feedback
searcher.submit_feedback(
    search_id=response.search_id,
    result_id=result.id,
    feedback="positive",  # or "negative"
    comment="Exactly what I needed!"  # optional
)

# View search history
recent = searcher.get_recent_searches(limit=10)

# Get specific search
search = searcher.get_search_by_id(search_id)

Over time, sources with positive feedback get boosted in rankings.

Roadmap

Phase 1 (Current - MVP)

  • ✅ Core search with Jira & Confluence
  • ✅ LLM-based query understanding
  • ✅ Feedback system
  • ✅ Query expansion

Phase 2 (Planned)

  • Semantic search with vector embeddings
  • Query caching
  • GitHub MCP integration
  • Internet search integration
  • Result clustering
  • Export to reports

Future Directions

  • Additional MCP integrations - GitHub, Google Drive, and other enterprise sources
  • Local LLM support - Integration with locally-hosted models for privacy-sensitive deployments
  • Task-specific LLM routing - Use different models for different pipeline stages (e.g., fast model for routing, capable model for ranking)
  • Self-evaluation loop - Automatically assess result quality and refine searches when needed

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

License

MIT License - see LICENSE file for details.

Support

  • Documentation: See this README and inline code documentation
  • Issues: Open an issue on GitHub
  • Discussions: Start a discussion on GitHub

Citation

If you use this in research, please cite:

@software{context_bridge,
  title = {Context Bridge: Intelligent Multi-Source Retrieval via MCP},
  year = {2025},
  url = {https://github.com/YOUR_ORG/context-bridge}
}

Releases

No releases published

Packages

 
 
 

Contributors