https-deeplearning-ai · MuhammadSohaibCh · Feb 12, 2026 · Feb 13, 2026 · Feb 14, 2026 · Feb 14, 2026
diff --git a/.env.example b/.env.example
@@ -1,2 +1,3 @@
-# Copy this file to .env and add your actual API key
-ANTHROPIC_API_KEY=your-anthropic-api-key-here
+# Ollama settings (free local LLM - no API key needed)
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_MODEL=llama3.1:8b
diff --git a/.gitignore b/.gitignore
@@ -28,4 +28,4 @@ uploads/
 
 # OS
 .DS_Store
-Thumbs.db
+Thumbs.dbOllamaSetup.exe
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,53 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Running the Application
+
+```bash
+# Install dependencies
+uv sync
+
+# Start the server (from project root)
+cd backend && uv run uvicorn app:app --reload --port 8000
+
+# Or use the shell script
+./run.sh
+```
+
+Requires Ollama running locally with a model installed (configured in `.env`).
+The app is served at http://localhost:8000 with API docs at http://localhost:8000/docs.
+
+## Environment
+
+- Python 3.13+ with `uv` package manager
+- **Always use `uv` to run commands and manage dependencies. Never use `pip` directly.**
+- `.env` file in project root with `OLLAMA_BASE_URL` and `OLLAMA_MODEL`
+- No test suite exists currently
+
+## Architecture
+
+This is a RAG (Retrieval-Augmented Generation) chatbot for course materials. FastAPI backend serves both the API and the frontend static files.
+
+**Query flow:** Frontend → `app.py` (FastAPI) → `rag_system.py` (orchestrator) → searches `vector_store.py` (ChromaDB) → sends query + retrieved chunks to `ai_generator.py` (Ollama LLM) → response back to frontend.
+
+**Key design decisions:**
+- Search-first approach: every query searches ChromaDB before calling the LLM (no tool calling — small local models can't handle it reliably)
+- Two ChromaDB collections: `course_catalog` (metadata for fuzzy course name matching) and `course_content` (chunked text for semantic search)
+- Embeddings via `all-MiniLM-L6-v2` sentence-transformer
+- Session history is in-memory only (lost on restart), capped at 2 exchanges per session
+- Documents are chunked at 800 chars with 100 char overlap
+- Course documents in `docs/` are auto-loaded on server startup
+
+**Backend modules:**
+- `app.py` — API endpoints and static file serving
+- `rag_system.py` — orchestrates search → LLM → session flow
+- `vector_store.py` — ChromaDB wrapper with semantic search and course/lesson filtering
+- `ai_generator.py` — Ollama API client
+- `document_processor.py` — parses course .txt/.pdf/.docx files into structured chunks
+- `search_tools.py` — search tool abstraction with source tracking
+- `session_manager.py` — per-session conversation memory
+- `models.py` — `Course`, `Lesson`, `CourseChunk` dataclasses
+- `config.py` — loads settings from `.env`
+
+**Frontend:** plain HTML/JS/CSS in `frontend/`, uses `marked.js` for markdown rendering. No build step.
diff --git a/backend/ai_generator.py b/backend/ai_generator.py
@@ -1,25 +1,31 @@
-import anthropic
-from typing import List, Optional, Dict, Any
+import httpx
+import json
+from typing import List, Optional, Dict, Any, Callable, Tuple
+
 
 class AIGenerator:
-    """Handles interactions with Anthropic's Claude API for generating responses"""
-
-    # Static system prompt to avoid rebuilding on each call
-    SYSTEM_PROMPT = """ You are an AI assistant specialized in course materials and educational content with access to a comprehensive search tool for course information.
+    """Handles interactions with Ollama's local LLM API for generating responses"""
+
+    MAX_TOOL_ROUNDS = 3
 
-Search Tool Usage:
-- Use the search tool **only** for questions about specific course content or detailed educational materials
-- **One search per query maximum**
-- Synthesize search results into accurate, fact-based responses
-- If search yields no results, state this clearly without offering alternatives
+    # Static system prompt to avoid rebuilding on each call
+    SYSTEM_PROMPT = """ You are an AI assistant specialized in course materials and educational content.
 
 Response Protocol:
-- **General knowledge questions**: Answer using existing knowledge without searching
-- **Course-specific questions**: Search first, then answer
-- **No meta-commentary**:
- - Provide direct answers only — no reasoning process, search explanations, or question-type analysis
- - Do not mention "based on the search results"
+- Answer questions using the provided course context
+- If the context doesn't contain relevant information, say so clearly
+- Do not mention "based on the context" or "based on the search results"
+- For outline, structure, or "what lessons" queries: You MUST list EVERY lesson exactly as provided in the context using markdown bullet points. Output the course title, course link, then each lesson as a bullet point "- **Lesson N:** Title". Do NOT summarize, group, or skip any lessons.
 
+Tool Usage:
+- You have access to search tools for finding course content. Use them when the user asks a question.
+- When the user asks about a SPECIFIC lesson (e.g. "lesson 5 of MCP"), use search_course_content with course_name AND lesson_number to get that lesson's actual content. Do NOT just return the course outline.
+- Use get_course_outline only when the user asks to LIST or OVERVIEW all lessons in a course.
+- For complex queries, break them into steps: first find the relevant course/lesson, then search for specific content.
+- After receiving tool results, synthesize them into a well-structured answer. Do NOT dump raw tool output.
+- Structure your final answer using markdown: use headings, bullet points, and bold for key terms.
+- Only include information that directly answers the user's question — ignore irrelevant tool results.
+- If tool results are empty or unhelpful, say so honestly instead of fabricating an answer.
 
 All responses must be:
 1. **Brief, Concise and focused** - Get to the point quickly
@@ -28,108 +34,158 @@ class AIGenerator:
 4. **Example-supported** - Include relevant examples when they aid understanding
 Provide only the direct answer to what was asked.
 """
-    
-    def __init__(self, api_key: str, model: str):
-        self.client = anthropic.Anthropic(api_key=api_key)
+
+    def __init__(self, base_url: str, model: str):
+        self.base_url = base_url.rstrip("/")
         self.model = model
-
-        # Pre-build base API parameters
-        self.base_params = {
+        # 120s timeout to handle first-request model loading
+        self.http_client = httpx.Client(timeout=120.0)
+
+    def _call_ollama(self, messages: List[Dict], tools: Optional[List] = None) -> Dict:
+        """Make a POST request to the Ollama chat API."""
+        payload = {
             "model": self.model,
-            "temperature": 0,
-            "max_tokens": 800
+            "messages": messages,
+            "stream": False,
+            "options": {
+                "temperature": 0,
+                "num_predict": 800
+            }
         }
-
+        if tools is not None:
+            payload["tools"] = tools
+
+        resp = self.http_client.post(f"{self.base_url}/api/chat", json=payload)
+        resp.raise_for_status()
+        return resp.json()
+
+    def _execute_tool_call(self, tool_call: dict, tool_executor: Callable) -> Tuple[str, str]:
+        """Extract tool name/args from a tool call and execute via tool_executor."""
+        func = tool_call.get("function", {})
+        name = func.get("name", "")
+        arguments = func.get("arguments", {})
+        try:
+            result = tool_executor(name, **arguments)
+            return name, str(result)
+        except Exception as e:
+            return name, f"Tool error: {e}"
+
+    def _parse_tool_call_from_text(self, content: str) -> Optional[dict]:
+        """Try to recover a tool call that the LLM wrote as plain text.
+
+        Small models sometimes emit JSON-like text instead of using the
+        structured tool_calls field.  We attempt a best-effort parse.
+        """
+        import re
+        if not content:
+            return None
+
+        # Try to find JSON-ish blob with "name" and "parameters"
+        # Handle unquoted identifiers like: {"name": get_course_outline, ...}
+        text = content.strip()
+        # Quick gate: must look like it's trying to be a tool call
+        if '"name"' not in text and "'name'" not in text:
+            return None
+
+        # Fix common malformed JSON: unquoted values after "name":
+        text = re.sub(r'"name"\s*:\s*([a-zA-Z_]\w*)', r'"name": "\1"', text)
+        # Fix single quotes to double quotes
+        text = text.replace("'", '"')
+
+        try:
+            obj = json.loads(text)
+        except json.JSONDecodeError:
+            # Try to extract just the JSON object
+            match = re.search(r'\{.*\}', text, re.DOTALL)
+            if not match:
+                return None
+            try:
+                obj = json.loads(match.group())
+            except json.JSONDecodeError:
+                return None
+
+        name = obj.get("name")
+        params = obj.get("parameters") or obj.get("arguments") or {}
+        if not name:
+            return None
+
+        return {"function": {"name": name, "arguments": params}}
+
+    def _run_tool_round(self, messages: List[Dict], tools: List,
+                        tool_executor: Callable, remaining_rounds: int) -> str:
+        """Recursively call Ollama, executing tool calls until the LLM produces text."""
+        # If rounds remain, offer tools; otherwise force a text-only response
+        data = self._call_ollama(
+            messages, tools=tools if remaining_rounds > 0 else None
+        )
+
+        assistant_msg = data.get("message", {})
+        tool_calls = assistant_msg.get("tool_calls")
+        content = assistant_msg.get("content", "")
+
+        # Fallback: if the LLM wrote the tool call as plain text, parse it
+        if not tool_calls and content and remaining_rounds > 0:
+            recovered = self._parse_tool_call_from_text(content)
+            if recovered:
+                tool_calls = [recovered]
+                # Rewrite assistant_msg so context stays consistent
+                assistant_msg = {"role": "assistant", "content": "", "tool_calls": tool_calls}
+
+        # Base cases: no tool calls or no remaining rounds
+        if not tool_calls or remaining_rounds <= 0:
+            return content
+
+        # Append the assistant message (with its tool_calls) to context
+        messages.append(assistant_msg)
+
+        # Execute each tool call and append results
+        for tc in tool_calls:
+            name, result = self._execute_tool_call(tc, tool_executor)
+            print(f"[Tool Call] {name}({tc.get('function', {}).get('arguments', {})}) -> {len(result)} chars")
+            messages.append({"role": "tool", "content": result})
+
+        return self._run_tool_round(messages, tools, tool_executor, remaining_rounds - 1)
+
     def generate_response(self, query: str,
+                         context: str = "",
                          conversation_history: Optional[str] = None,
                          tools: Optional[List] = None,
-                         tool_manager=None) -> str:
+                         tool_executor: Optional[Callable] = None) -> str:
         """
-        Generate AI response with optional tool usage and conversation context.
-        
+        Generate AI response using provided context.
+
         Args:
             query: The user's question or request
+            context: Retrieved course content to answer from
             conversation_history: Previous messages for context
-            tools: Available tools the AI can use
-            tool_manager: Manager to execute tools
-            
+            tools: Optional tool definitions for Ollama function calling
+            tool_executor: Callable(name, **kwargs) to execute tool calls
+
         Returns:
             Generated response as string
         """
-
-        # Build system content efficiently - avoid string ops when possible
-        system_content = (
-            f"{self.SYSTEM_PROMPT}\n\nPrevious conversation:\n{conversation_history}"
-            if conversation_history 
-            else self.SYSTEM_PROMPT
-        )
-
-        # Prepare API call parameters efficiently
-        api_params = {
-            **self.base_params,
-            "messages": [{"role": "user", "content": query}],
-            "system": system_content
-        }
-
-        # Add tools if available
-        if tools:
-            api_params["tools"] = tools
-            api_params["tool_choice"] = {"type": "auto"}
-
-        # Get response from Claude
-        response = self.client.messages.create(**api_params)
-
-        # Handle tool execution if needed
-        if response.stop_reason == "tool_use" and tool_manager:
-            return self._handle_tool_execution(response, api_params, tool_manager)
-
-        # Return direct response
-        return response.content[0].text
-
-    def _handle_tool_execution(self, initial_response, base_params: Dict[str, Any], tool_manager):
-        """
-        Handle execution of tool calls and get follow-up response.
-
-        Args:
-            initial_response: The response containing tool use requests
-            base_params: Base API parameters
-            tool_manager: Manager to execute tools
-
-        Returns:
-            Final response text after tool execution
-        """
-        # Start with existing messages
-        messages = base_params["messages"].copy()
-
-        # Add AI's tool use response
-        messages.append({"role": "assistant", "content": initial_response.content})
-
-        # Execute all tool calls and collect results
-        tool_results = []
-        for content_block in initial_response.content:
-            if content_block.type == "tool_use":
-                tool_result = tool_manager.execute_tool(
-                    content_block.name, 
-                    **content_block.input
-                )
-
-                tool_results.append({
-                    "type": "tool_result",
-                    "tool_use_id": content_block.id,
-                    "content": tool_result
-                })
-
-        # Add tool results as single message
-        if tool_results:
-            messages.append({"role": "user", "content": tool_results})
-
-        # Prepare final API call without tools
-        final_params = {
-            **self.base_params,
-            "messages": messages,
-            "system": base_params["system"]
-        }
-
-        # Get final response
-        final_response = self.client.messages.create(**final_params)
-        return final_response.content[0].text
+
+        # Build system content
+        system_content = self.SYSTEM_PROMPT
+        if conversation_history:
+            system_content += f"\n\nPrevious conversation:\n{conversation_history}"
+
+        # Build user message with context
+        if context:
+            user_content = f"Course context:\n{context}\n\nQuestion: {query}"
+        else:
+            user_content = query
+
+        # Build messages list
+        messages = [
+            {"role": "system", "content": system_content},
+            {"role": "user", "content": user_content}
+        ]
+
+        # Tool-calling path: recursive loop
+        if tools is not None and tool_executor is not None:
+            return self._run_tool_round(messages, tools, tool_executor, self.MAX_TOOL_ROUNDS)
+
+        # Default path: single call, no tools
+        data = self._call_ollama(messages)
+        return data.get("message", {}).get("content", "")