Skip to content

Lioness100/decimeta

Repository files navigation

decimeta

Demo 1 Demo 2

Evaluates the Dewey Decimal System classification for a given query. Since the DDS is closed-source and copyrighted, this project scrapes data from the Melvil Decimal System, which is the next best thing. (Learn more).

How It Works

Decimeta uses two complementary approaches to classify queries:

  1. Compares query embeddings against a vector database of MDS classifications using OpenAI's text-embedding-3-small model and Pinecone. If the query has something to do with its topic, this search works well. However, it may miss nuances or context.

  2. Uses OpenAI's GPT-4.1 to navigate the classification hierarchy step-by-step, narrowing from hundreds to tens to ones place (and into decimals if needed). GPT is prone to hallucinating DDC numbers, which is why we must give it concrete options. GPT-4.1 was picked as it is the cheapest and fastest non-reasoning model.

Setup

Install dependencies:

bun install

Set environment variables:

OPENAI_API_KEY=your_key_here
PINECONE_API_KEY=your_key_here

Usage

Scrape MDS data. This shouldn't be necessary, as scraped data is already provided in mds.json. It's also slow and intensive for LibraryThing, so please use sparingly:

bun run scrape

Generate and store embeddings (this will also take a while):

bun run vectorize

Start the server:

bun start

The web interface will be available at http://localhost:3000. It will use the API routes /api/classify/gpt?query=your_query and /api/classify/embeddings?query=your_query.

About

A website to help you find the correct Dewey Decimal number for any subject using AI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published