Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Copy this file to .env and add your actual API key
ANTHROPIC_API_KEY=your-anthropic-api-key-here
# Ollama settings (free local LLM - no API key needed)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ uploads/

# OS
.DS_Store
Thumbs.db
Thumbs.dbOllamaSetup.exe
53 changes: 53 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Running the Application

```bash
# Install dependencies
uv sync

# Start the server (from project root)
cd backend && uv run uvicorn app:app --reload --port 8000

# Or use the shell script
./run.sh
```

Requires Ollama running locally with a model installed (configured in `.env`).
The app is served at http://localhost:8000 with API docs at http://localhost:8000/docs.

## Environment

- Python 3.13+ with `uv` package manager
- **Always use `uv` to run commands and manage dependencies. Never use `pip` directly.**
- `.env` file in project root with `OLLAMA_BASE_URL` and `OLLAMA_MODEL`
- No test suite exists currently

## Architecture

This is a RAG (Retrieval-Augmented Generation) chatbot for course materials. FastAPI backend serves both the API and the frontend static files.

**Query flow:** Frontend → `app.py` (FastAPI) → `rag_system.py` (orchestrator) → searches `vector_store.py` (ChromaDB) → sends query + retrieved chunks to `ai_generator.py` (Ollama LLM) → response back to frontend.

**Key design decisions:**
- Search-first approach: every query searches ChromaDB before calling the LLM (no tool calling — small local models can't handle it reliably)
- Two ChromaDB collections: `course_catalog` (metadata for fuzzy course name matching) and `course_content` (chunked text for semantic search)
- Embeddings via `all-MiniLM-L6-v2` sentence-transformer
- Session history is in-memory only (lost on restart), capped at 2 exchanges per session
- Documents are chunked at 800 chars with 100 char overlap
- Course documents in `docs/` are auto-loaded on server startup

**Backend modules:**
- `app.py` — API endpoints and static file serving
- `rag_system.py` — orchestrates search → LLM → session flow
- `vector_store.py` — ChromaDB wrapper with semantic search and course/lesson filtering
- `ai_generator.py` — Ollama API client
- `document_processor.py` — parses course .txt/.pdf/.docx files into structured chunks
- `search_tools.py` — search tool abstraction with source tracking
- `session_manager.py` — per-session conversation memory
- `models.py` — `Course`, `Lesson`, `CourseChunk` dataclasses
- `config.py` — loads settings from `.env`

**Frontend:** plain HTML/JS/CSS in `frontend/`, uses `marked.js` for markdown rendering. No build step.
272 changes: 164 additions & 108 deletions backend/ai_generator.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
import anthropic
from typing import List, Optional, Dict, Any
import httpx
import json
from typing import List, Optional, Dict, Any, Callable, Tuple


class AIGenerator:
"""Handles interactions with Anthropic's Claude API for generating responses"""

# Static system prompt to avoid rebuilding on each call
SYSTEM_PROMPT = """ You are an AI assistant specialized in course materials and educational content with access to a comprehensive search tool for course information.
"""Handles interactions with Ollama's local LLM API for generating responses"""

MAX_TOOL_ROUNDS = 3

Search Tool Usage:
- Use the search tool **only** for questions about specific course content or detailed educational materials
- **One search per query maximum**
- Synthesize search results into accurate, fact-based responses
- If search yields no results, state this clearly without offering alternatives
# Static system prompt to avoid rebuilding on each call
SYSTEM_PROMPT = """ You are an AI assistant specialized in course materials and educational content.

Response Protocol:
- **General knowledge questions**: Answer using existing knowledge without searching
- **Course-specific questions**: Search first, then answer
- **No meta-commentary**:
- Provide direct answers only — no reasoning process, search explanations, or question-type analysis
- Do not mention "based on the search results"
- Answer questions using the provided course context
- If the context doesn't contain relevant information, say so clearly
- Do not mention "based on the context" or "based on the search results"
- For outline, structure, or "what lessons" queries: You MUST list EVERY lesson exactly as provided in the context using markdown bullet points. Output the course title, course link, then each lesson as a bullet point "- **Lesson N:** Title". Do NOT summarize, group, or skip any lessons.

Tool Usage:
- You have access to search tools for finding course content. Use them when the user asks a question.
- When the user asks about a SPECIFIC lesson (e.g. "lesson 5 of MCP"), use search_course_content with course_name AND lesson_number to get that lesson's actual content. Do NOT just return the course outline.
- Use get_course_outline only when the user asks to LIST or OVERVIEW all lessons in a course.
- For complex queries, break them into steps: first find the relevant course/lesson, then search for specific content.
- After receiving tool results, synthesize them into a well-structured answer. Do NOT dump raw tool output.
- Structure your final answer using markdown: use headings, bullet points, and bold for key terms.
- Only include information that directly answers the user's question — ignore irrelevant tool results.
- If tool results are empty or unhelpful, say so honestly instead of fabricating an answer.

All responses must be:
1. **Brief, Concise and focused** - Get to the point quickly
Expand All @@ -28,108 +34,158 @@ class AIGenerator:
4. **Example-supported** - Include relevant examples when they aid understanding
Provide only the direct answer to what was asked.
"""
def __init__(self, api_key: str, model: str):
self.client = anthropic.Anthropic(api_key=api_key)

def __init__(self, base_url: str, model: str):
self.base_url = base_url.rstrip("/")
self.model = model

# Pre-build base API parameters
self.base_params = {
# 120s timeout to handle first-request model loading
self.http_client = httpx.Client(timeout=120.0)

def _call_ollama(self, messages: List[Dict], tools: Optional[List] = None) -> Dict:
"""Make a POST request to the Ollama chat API."""
payload = {
"model": self.model,
"temperature": 0,
"max_tokens": 800
"messages": messages,
"stream": False,
"options": {
"temperature": 0,
"num_predict": 800
}
}

if tools is not None:
payload["tools"] = tools

resp = self.http_client.post(f"{self.base_url}/api/chat", json=payload)
resp.raise_for_status()
return resp.json()

def _execute_tool_call(self, tool_call: dict, tool_executor: Callable) -> Tuple[str, str]:
"""Extract tool name/args from a tool call and execute via tool_executor."""
func = tool_call.get("function", {})
name = func.get("name", "")
arguments = func.get("arguments", {})
try:
result = tool_executor(name, **arguments)
return name, str(result)
except Exception as e:
return name, f"Tool error: {e}"

def _parse_tool_call_from_text(self, content: str) -> Optional[dict]:
"""Try to recover a tool call that the LLM wrote as plain text.

Small models sometimes emit JSON-like text instead of using the
structured tool_calls field. We attempt a best-effort parse.
"""
import re
if not content:
return None

# Try to find JSON-ish blob with "name" and "parameters"
# Handle unquoted identifiers like: {"name": get_course_outline, ...}
text = content.strip()
# Quick gate: must look like it's trying to be a tool call
if '"name"' not in text and "'name'" not in text:
return None

# Fix common malformed JSON: unquoted values after "name":
text = re.sub(r'"name"\s*:\s*([a-zA-Z_]\w*)', r'"name": "\1"', text)
# Fix single quotes to double quotes
text = text.replace("'", '"')

try:
obj = json.loads(text)
except json.JSONDecodeError:
# Try to extract just the JSON object
match = re.search(r'\{.*\}', text, re.DOTALL)
if not match:
return None
try:
obj = json.loads(match.group())
except json.JSONDecodeError:
return None

name = obj.get("name")
params = obj.get("parameters") or obj.get("arguments") or {}
if not name:
return None

return {"function": {"name": name, "arguments": params}}

def _run_tool_round(self, messages: List[Dict], tools: List,
tool_executor: Callable, remaining_rounds: int) -> str:
"""Recursively call Ollama, executing tool calls until the LLM produces text."""
# If rounds remain, offer tools; otherwise force a text-only response
data = self._call_ollama(
messages, tools=tools if remaining_rounds > 0 else None
)

assistant_msg = data.get("message", {})
tool_calls = assistant_msg.get("tool_calls")
content = assistant_msg.get("content", "")

# Fallback: if the LLM wrote the tool call as plain text, parse it
if not tool_calls and content and remaining_rounds > 0:
recovered = self._parse_tool_call_from_text(content)
if recovered:
tool_calls = [recovered]
# Rewrite assistant_msg so context stays consistent
assistant_msg = {"role": "assistant", "content": "", "tool_calls": tool_calls}

# Base cases: no tool calls or no remaining rounds
if not tool_calls or remaining_rounds <= 0:
return content

# Append the assistant message (with its tool_calls) to context
messages.append(assistant_msg)

# Execute each tool call and append results
for tc in tool_calls:
name, result = self._execute_tool_call(tc, tool_executor)
print(f"[Tool Call] {name}({tc.get('function', {}).get('arguments', {})}) -> {len(result)} chars")
messages.append({"role": "tool", "content": result})

return self._run_tool_round(messages, tools, tool_executor, remaining_rounds - 1)

def generate_response(self, query: str,
context: str = "",
conversation_history: Optional[str] = None,
tools: Optional[List] = None,
tool_manager=None) -> str:
tool_executor: Optional[Callable] = None) -> str:
"""
Generate AI response with optional tool usage and conversation context.
Generate AI response using provided context.

Args:
query: The user's question or request
context: Retrieved course content to answer from
conversation_history: Previous messages for context
tools: Available tools the AI can use
tool_manager: Manager to execute tools
tools: Optional tool definitions for Ollama function calling
tool_executor: Callable(name, **kwargs) to execute tool calls

Returns:
Generated response as string
"""

# Build system content efficiently - avoid string ops when possible
system_content = (
f"{self.SYSTEM_PROMPT}\n\nPrevious conversation:\n{conversation_history}"
if conversation_history
else self.SYSTEM_PROMPT
)

# Prepare API call parameters efficiently
api_params = {
**self.base_params,
"messages": [{"role": "user", "content": query}],
"system": system_content
}

# Add tools if available
if tools:
api_params["tools"] = tools
api_params["tool_choice"] = {"type": "auto"}

# Get response from Claude
response = self.client.messages.create(**api_params)

# Handle tool execution if needed
if response.stop_reason == "tool_use" and tool_manager:
return self._handle_tool_execution(response, api_params, tool_manager)

# Return direct response
return response.content[0].text

def _handle_tool_execution(self, initial_response, base_params: Dict[str, Any], tool_manager):
"""
Handle execution of tool calls and get follow-up response.

Args:
initial_response: The response containing tool use requests
base_params: Base API parameters
tool_manager: Manager to execute tools

Returns:
Final response text after tool execution
"""
# Start with existing messages
messages = base_params["messages"].copy()

# Add AI's tool use response
messages.append({"role": "assistant", "content": initial_response.content})

# Execute all tool calls and collect results
tool_results = []
for content_block in initial_response.content:
if content_block.type == "tool_use":
tool_result = tool_manager.execute_tool(
content_block.name,
**content_block.input
)

tool_results.append({
"type": "tool_result",
"tool_use_id": content_block.id,
"content": tool_result
})

# Add tool results as single message
if tool_results:
messages.append({"role": "user", "content": tool_results})

# Prepare final API call without tools
final_params = {
**self.base_params,
"messages": messages,
"system": base_params["system"]
}

# Get final response
final_response = self.client.messages.create(**final_params)
return final_response.content[0].text

# Build system content
system_content = self.SYSTEM_PROMPT
if conversation_history:
system_content += f"\n\nPrevious conversation:\n{conversation_history}"

# Build user message with context
if context:
user_content = f"Course context:\n{context}\n\nQuestion: {query}"
else:
user_content = query

# Build messages list
messages = [
{"role": "system", "content": system_content},
{"role": "user", "content": user_content}
]

# Tool-calling path: recursive loop
if tools is not None and tool_executor is not None:
return self._run_tool_round(messages, tools, tool_executor, self.MAX_TOOL_ROUNDS)

# Default path: single call, no tools
data = self._call_ollama(messages)
return data.get("message", {}).get("content", "")
Loading