An end-to-end intelligent Q&A system built on two research papers:
| Paper | Role in this system |
|---|---|
| SocraticKG (Choi et al., 2026) | Builds a Knowledge Graph from your documents via 5W1H QA-guided triple extraction |
| GraphRAG-R1 (Yu et al., 2026) | Answers complex multi-hop questions by iteratively reasoning over the KG |
Powered by Google Gemini.
Documents (PDF / TXT)
│
▼
┌─────────────────────────────────────────┐
│ SocraticKG Pipeline │
│ │
│ 1. 5W1H QA Generation (per chunk) │
│ 2. Triple Extraction from QA pairs │
│ 3. Canonicalization (dedup synonyms) │
└────────────────────┬────────────────────┘
│ triples
▼
Knowledge Graph (NetworkX)
│
│ semantic retrieval
▼
┌─────────────────────────────────────────┐
│ GraphRAG-R1 Reasoner │
│ │
│ Think → Query KG → Retrieve → Think │
│ (with PRA + CAF reward tracking) │
└────────────────────┬────────────────────┘
│
▼
Final Answer
This project uses uv for fast dependency management.
# Clone the repository and navigate to it
cd socratic_rl
# Create a virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e .Create a .env file:
GEMINI_API_KEY=your_key_here
GEMINI_MODEL=gemini-3.1-flash-lite-preview
Get your key at: https://aistudio.google.com/app/apikey
python app.pyOpen your browser at http://localhost:8000
Here is an example of the system in action, tested on the story "The Flower with 7 colors":
-
Upload a document — click "Upload Document" and choose a PDF or text file.
- The SocraticKG pipeline runs automatically: QA generation → triple extraction → canonicalization.
-
Explore the KG — the Knowledge Graph renders in the left panel.
- Click any node to see its connections.
- Use the search bar to find nodes semantically.
- Toggle physics simulation on/off.
-
Ask questions — type any question (simple or multi-hop) in the chat panel.
- GraphRAG-R1 will think, query the KG, and reason step-by-step.
- Click "Show reasoning" to see each Think / Query / Retrieve step.
- Metrics: retrieval count, PRA reward, CAF score are shown per answer.
-
Upload multiple documents — re-upload with
append=truequery param (via API) to grow the KG.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/upload |
Upload PDF/TXT, build KG |
POST |
/api/query |
Multi-hop Q&A |
GET |
/api/graph |
Full KG (vis.js format) |
GET |
/api/graph/search?q=... |
Semantic node search |
GET |
/api/graph/node?name=... |
Node details |
GET |
/api/status |
KG statistics |
DELETE |
/api/graph |
Clear KG |
| Parameter | Default | Description |
|---|---|---|
CHUNK_SIZE |
3000 | Characters per document chunk |
MAX_QA_PER_CHUNK |
25 | QA pairs generated per chunk |
MAX_RETRIEVAL_CALLS |
8 | Max KG queries per question |
HOP_DEPTH |
2 | Neighbourhood depth for retrieval |
PRA_DECAY_FACTOR |
0.5 | PRA reward decay (paper §3.2.2) |
CAF_B |
0.1 | CAF efficiency coefficient (paper §3.2.3) |
socratic_rl/
├── app.py # FastAPI server
├── config.py # All hyperparameters
├── gemini_client.py # Gemini API wrapper
├── document_processor.py # PDF / text ingestion
├── socratic_kg.py # SocraticKG pipeline (Stages 1-3)
├── knowledge_graph.py # NetworkX KG + retrieval
├── graphrag_r1.py # GraphRAG-R1 reasoning loop
├── pyproject.toml # uv project configuration
└── frontend/
└── index.html # KG visualization + Q&A UI
