Description
The scripts/ directory currently contains legacy LLM-as-a-Judge evaluation code that has already been consolidated and extended in YESciEval. To avoid duplication and ensure consistency across projects, this repository should be refactored to depend on YESciEval for all evaluation logic.
Tasks
Acceptance Criteria
- No duplicated LLM-as-a-Judge logic remains in this repository
- Evaluation scripts cleanly depend on YESciEval as an external library
- Report-level and collection-level evaluation workflows remain functional
- Codebase is easier to maintain and aligned with the shared evaluation framework
Rationale
YESciEval is the canonical location for evaluation rubrics and LLM-as-a-Judge logic. This cleanup improves maintainability, avoids divergence, and strengthens reuse across projects.
Description
The
scripts/directory currently contains legacy LLM-as-a-Judge evaluation code that has already been consolidated and extended in YESciEval. To avoid duplication and ensure consistency across projects, this repository should be refactored to depend on YESciEval for all evaluation logic.Tasks
scripts/https://github.com/sciknoworg/deep-research/tree/main/scripts
Acceptance Criteria
Rationale
YESciEval is the canonical location for evaluation rubrics and LLM-as-a-Judge logic. This cleanup improves maintainability, avoids divergence, and strengthens reuse across projects.