Evaluates the Dewey Decimal System classification for a given query. Since the DDS is closed-source and copyrighted, this project scrapes data from the Melvil Decimal System, which is the next best thing. (Learn more).
Decimeta uses two complementary approaches to classify queries:
-
Compares query embeddings against a vector database of MDS classifications using OpenAI's text-embedding-3-small model and Pinecone. If the query has something to do with its topic, this search works well. However, it may miss nuances or context.
-
Uses OpenAI's GPT-4.1 to navigate the classification hierarchy step-by-step, narrowing from hundreds to tens to ones place (and into decimals if needed). GPT is prone to hallucinating DDC numbers, which is why we must give it concrete options. GPT-4.1 was picked as it is the cheapest and fastest non-reasoning model.
Install dependencies:
bun installSet environment variables:
OPENAI_API_KEY=your_key_here
PINECONE_API_KEY=your_key_hereScrape MDS data. This shouldn't be necessary, as scraped data is already provided in mds.json. It's also slow and intensive for LibraryThing, so please use sparingly:
bun run scrapeGenerate and store embeddings (this will also take a while):
bun run vectorizeStart the server:
bun startThe web interface will be available at http://localhost:3000. It will use the
API routes /api/classify/gpt?query=your_query and /api/classify/embeddings?query=your_query.

