Skip to content

Evaluation notebooks for NLP and Ecology domains [#23]#24

Open
MikeACedric wants to merge 2 commits intomainfrom
feature/restructure-llm-as-a-judge
Open

Evaluation notebooks for NLP and Ecology domains [#23]#24
MikeACedric wants to merge 2 commits intomainfrom
feature/restructure-llm-as-a-judge

Conversation

@MikeACedric
Copy link
Copy Markdown
Collaborator

@MikeACedric MikeACedric commented Mar 27, 2026

This PR references the issue [#23] and includes the following changes:

Repository Cleanup:

  • Removed unused files and folders to streamline the project structure.

New Notebooks:
Added 4 new notebooks (2 per domain) covering single report and batch evaluation:

  • NLP — single report & batch evaluation notebooks
  • Ecology — single report & batch evaluation notebooks

Documentation:

  • Updated docs to include step-by-step instructions for running the new notebooks.

@MikeACedric MikeACedric self-assigned this Mar 27, 2026

<h2 align="center">LLM-as-a-Judge (LLMJ) Framework</h2>

The **LLM-as-a-Judge** is a robust, Python-based evaluation toolkit designed for objective and configurable analysis of Large Language Model (LLM)-generated research reports. It enables researchers to quantify report quality across key dimensions like `breadth`, `depth`, `rigor`, `gap` and `innovation`, with pointwise rubrics implemented via **[YESciEval](https://yescieval.readthedocs.io/)** — a framework for scientific evaluation of LLM outputs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeACedric Please note: LLM-as-a-judge is an general term as a name for an evaluation paradigm predominant in the AI community these days. It is not a toolkit! The framework here is YESciEval. I would ask to give some thought into this description text and avoid copying descriptions that do not make sense from earlier versions of this repository.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the description should say what is being evaluated and how is it being evaluated. For the latter part, it can read as follows "we adopt the LLM-as-a-judge paradigm w.r.t. to the x implemented evaluation categories in the YESciEval library which includes y evaluation rubrics."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants