Webuddhist AI

A FastAPI-based AI application for searching and chatting with Buddhist texts using RAG (Retrieval-Augmented Generation) technology.

Overview

Webuddhist AI provides an intelligent search and chat interface for Buddhist texts, supporting multiple search types including hybrid, semantic, BM25, and exact matching. The API uses LangGraph for agentic workflows and integrates with Milvus vector database for efficient text retrieval.

API Routes

Root & Health Endpoints

`GET /`

Returns the HTML chat interface.

Response: HTML content (text/html)

`GET /health`

Health check endpoint that verifies environment variables and service status.

Response:

{
  "status": "healthy"
}

Error Response (500):

{
  "detail": "Missing environment variables for Milvus or Gemini."
}

Chat Endpoints

`POST /api/chat/stream`

Streaming chat endpoint using Server-Sent Events (SSE). This endpoint processes chat messages through a LangGraph workflow and streams responses in real-time.

Request Body:

{
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of compassion?"
    }
  ]
}

Response: Server-Sent Events stream with the following event types:

search_results: Search results from hybrid search tool
token: Streaming text tokens from the AI model
done: Indicates completion
error: Error information if something goes wrong

Example Event:

data: {"type": "token", "data": "Compassion is..."}

data: {"type": "search_results", "data": [...], "queries": {...}}

data: {"type": "done", "data": {}}

Search Endpoints

All search endpoints are prefixed with /search.

`GET /search/info`

Returns API information and available search types.

Response:

{
  "message": "OpenPecha Search API",
  "version": "1.0.0",
  "endpoints": {
    "search": "/search"
  },
  "search_types": {
    "hybrid": "Combined BM25 + semantic search (default)",
    "bm25": "Keyword-based search",
    "semantic": "Meaning-based search",
    "exact": "Exact phrase matching"
  },
  "docs": "/docs"
}

`GET /search/debug`

Debug endpoint to test basic search functionality.

Response:

{
  "status": "success",
  "raw_results": "...",
  "results_type": "...",
  "results_length": 5,
  "first_result": "..."
}

`POST /search`

Unified search endpoint supporting multiple search types with filtering and hierarchical search capabilities.

Request Body:

{
  "query": "དེ་ལ་མི་དགར་ཅི་ཞིག་ཡོད། །",
  "search_type": "hybrid",
  "limit": 10,
  "return_text": true,
  "hierarchical": false,
  "parent_limit": null,
  "filter": {
    "title": "Some Title",
    "language": "bo"
  }
}

Request Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	The search query text (min length: 1)
`search_type`	string	No	"hybrid"	Type of search:`hybrid`, `bm25`, `semantic`, or `exact`
`limit`	integer	No	10	Maximum number of results (1-100)
`return_text`	boolean	No	true	If true, return full text in results
`hierarchical`	boolean	No	false	If true, perform parent->children two-stage search
`parent_limit`	integer	No	null	Max parents to retrieve when hierarchical=true (1-200)
`filter`	object	No	null	Optional filters (title, language)

Filter Object:

{
  "title": "Title Name" | ["Title1", "Title2"],
  "language": "bo" | ["bo", "en"]
}

Response:

{
  "query": "དེ་ལ་མི་དགར་ཅི་ཞིག་ཡོད། །",
  "search_type": "hybrid",
  "results": [
    {
      "id": "449691587532670411",
      "distance": 0.95,
      "entity": {
        "text": "དེ་ལ་མི་དགར་ཅི་ཞིག་ཡོད། །གང་ཕྱིར་འདི་དག་རང་བཞིན་མེད།"
      }
    }
  ],
  "count": 1
}

Search Types:

hybrid (default): Combines BM25 keyword search with semantic vector search for best results
bm25: Keyword-based search using BM25 algorithm
semantic: Meaning-based search using vector embeddings
exact: Exact phrase matching

Hierarchical Search:

When hierarchical: true, the search performs a two-stage process:

First searches for parent documents matching the query
Then searches for children of those parents
Returns only the children results

This is useful for structured documents with parent-child relationships.

Environment Variables

The following environment variables are required:

GEMINI_API_KEY or GOOGLE_API_KEY: Google Gemini API key for LLM
MILVUS_URI: Milvus vector database URI
MILVUS_TOKEN: Milvus authentication token
MILVUS_COLLECTION_NAME: Name of the Milvus collection (default: "test_kangyur_tengyur")
PORT: Server port (default: 8000)
ENV: Environment mode (set to "development" for auto-reload)

Running the Application

# Install dependencies
pip install -r requirements.txt

# Run the server
python main.py

The API will be available at http://localhost:8000 (or the port specified in PORT).

Interactive API documentation is available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Technology Stack

FastAPI: Web framework
LangGraph: Agentic workflow orchestration
LangChain: LLM integration
Google Gemini: Language model
Milvus: Vector database for semantic search
Server-Sent Events (SSE): Real-time streaming

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
chunk		chunk
helper		helper
prepare_data		prepare_data
resources		resources
route		route
type		type
.dockerignore		.dockerignore
.gitignore		.gitignore
AWS_SETUP_INSTRUCTIONS.md		AWS_SETUP_INSTRUCTIONS.md
DEPLOYMENT_QUICK_REFERENCE.md		DEPLOYMENT_QUICK_REFERENCE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
chat_ui.html		chat_ui.html
create-ecs-service.sh		create-ecs-service.sh
db.py		db.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webuddhist AI

Overview

API Routes

Root & Health Endpoints

`GET /`

`GET /health`

Chat Endpoints

`POST /api/chat/stream`

Search Endpoints

`GET /search/info`

`GET /search/debug`

`POST /search`

Environment Variables

Running the Application

Technology Stack

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Webuddhist AI

Overview

API Routes

Root & Health Endpoints

GET /

GET /health

Chat Endpoints

POST /api/chat/stream

Search Endpoints

GET /search/info

GET /search/debug

POST /search

Environment Variables

Running the Application

Technology Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`GET /health`

`POST /api/chat/stream`

`GET /search/info`

`GET /search/debug`

`POST /search`

Packages