Skip to content

Latest commit

 

History

History
224 lines (161 loc) · 8.51 KB

File metadata and controls

224 lines (161 loc) · 8.51 KB
OntoLearner Logo

OntoLearner: A Modular Python Library for Ontology Learning with LLMs

PyPI version PyPI Downloads License: MIT Hugging Face Collection pre-commit Documentation Status Maintenance DOI


OntoLearner is a modular and extensible Python library for ontology learning powered by Large Language Models (LLMs). It provides a unified framework covering the full workflow — from loading and modularizing ontologies to training, predicting, and evaluating learner models across multiple ontology learning tasks.

The framework is built around three core components:

  • 🧩 Ontologizers — load, parse, and modularize ontologies from 150+ ready-to-use sources across 20+ domains.
  • 📋 Learning Tasks — support for Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
  • 🤖 Learner Models — plug-and-play LLM, Retriever, and RAG-based learners with a consistent fit → predict → evaluate interface.

🧪 Installation

OntoLearner is available on PyPI and can be installed with pip:

pip install ontolearner

Verify the installation:

import ontolearner

print(ontolearner.__version__)

For additional installation options (e.g., from source, with optional dependencies), see the Installation Guide.


🔗 Essential Resources

Resource Description
📚 Documentation Full documentation website.
🤗 Datasets on Hugging Face Curated, machine-readable ontology datasets.
🚀 Quickstart Get started in minutes.
🕸️ Learning Tasks Term Typing, Taxonomy Discovery, Relation Extraction, and Text2Onto.
🧠 Learner Models LLM, Retriever, and RAG-based learner models.
📖 Ontologies Documentation Browse 150+ benchmark ontologies across 20+ domains.
🧩 Ontologizer Guide How to modularize and preprocess ontologies.
📊 Metrics Dashboard Explore benchmark ontology metrics and complexity scores.

✨ Key Features

  • 150+ Ontologizers across 20+ domains (biology, medicine, agriculture, chemistry, law, finance, and more).
  • Multiple learning tasks: Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
  • Three learner paradigms: LLM-based, Retriever-based, and Retrieval-Augmented Generation (RAG).
  • Hugging Face integration: auto-download ontologies and models directly from the Hub.
  • Unified API: consistent fit → predict → evaluate interface across all learners.
  • LearnerPipeline: end-to-end pipeline in a single call.
  • Extensible: easily plug in custom ontologies, learners, or retrievers.

🚀 Quick Tour

Loading an Ontology

Load any of the 150+ built-in ontologies and extract task datasets in just a few lines:

from ontolearner import Wine

# Initialize an ontologizer
ontology = Wine()

# Auto-download from Hugging Face and load
ontology.load()

# Extract learning task datasets
data = ontology.extract()

# Inspect ontology metadata
print(ontology)

Explore 150+ ready-to-use ontologies or learn how to work with ontologizers.


Retriever-Based Learner

Use a dense retriever model to perform non-taxonomic relation extraction:

from ontolearner import AutoRetrieverLearner, AgrO, train_test_split, evaluation_report

# Load and extract ontology data
ontology = AgrO()
ontology.load()
ontological_data = ontology.extract()

# Split into train and test sets
train_data, test_data = train_test_split(ontological_data, test_size=0.2, random_state=42)

# Initialize and load a retriever-based learner
task = 'non-taxonomic-re'
ret_learner = AutoRetrieverLearner(top_k=5)
ret_learner.load(model_id='sentence-transformers/all-MiniLM-L6-v2')

# Fit on training data and predict on test data
ret_learner.fit(train_data, task=task)
predicts = ret_learner.predict(test_data, task=task)

# Evaluate predictions
truth = ret_learner.tasks_ground_truth_former(data=test_data, task=task)
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
print(metrics)

Other available learners:


LearnerPipeline

LearnerPipeline consolidates the entire workflow — initialization, training, prediction, and evaluation — into a single call:

from ontolearner import LearnerPipeline, AgrO, train_test_split

# Load ontology and extract data
ontology = AgrO()
ontology.load()

train_data, test_data = train_test_split(
    ontology.extract(),
    test_size=0.2,
    random_state=42
)

# Initialize the pipeline with a dense retriever
pipeline = LearnerPipeline(
    retriever_id='sentence-transformers/all-MiniLM-L6-v2',
    batch_size=10,
    top_k=5
)

# Run: fit → predict → evaluate
outputs = pipeline(
    train_data=train_data,
    test_data=test_data,
    evaluate=True,
    task='non-taxonomic-re'
)

print("Metrics:", outputs['metrics'])
print("Elapsed time:", outputs['elapsed_time'])

⭐ Contribution

We welcome contributions of all kinds — bug reports, new features, documentation improvements, or new ontologies!

Please review our guidelines before getting started:

For bugs or questions, please open an issue in the GitHub Issue Tracker.


💡 Acknowledgements

If OntoLearner is useful in your research or work, please consider citing one of our publications:

@inproceedings{babaei2023llms4ol,
  title     = {LLMs4OL: Large Language Models for Ontology Learning},
  author    = {Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S{\"o}ren},
  booktitle = {International Semantic Web Conference},
  pages     = {408--427},
  year      = {2023},
  organization = {Springer}
}
@software{babaei_giglou_2025_15399783,
  author    = {Babaei Giglou, Hamed and D'Souza, Jennifer and Aioanei, Andrei
               and Mihindukulasooriya, Nandana and Auer, Sören},
  title     = {OntoLearner: A Modular Python Library for Ontology Learning with LLMs},
  month     = may,
  year      = 2025,
  publisher = {Zenodo},
  version   = {v1.3.0},
  doi       = {10.5281/zenodo.15399783},
  url       = {https://doi.org/10.5281/zenodo.15399783}
}

This software is archived on Zenodo under DOI and is licensed under License: MIT.