Refactor Deep-Research Evaluation to Use YESciEval (Remove Legacy LLM-as-a-Judge Code)

### Description
The `scripts/` directory currently contains legacy **LLM-as-a-Judge evaluation code** that has already been consolidated and extended in **YESciEval**. To avoid duplication and ensure consistency across projects, this repository should be refactored to depend on YESciEval for all evaluation logic.

### Tasks
- [ ] Review the contents of  
  `scripts/`  
  https://github.com/sciknoworg/deep-research/tree/main/scripts
- [ ] **Delete** all legacy LLM-as-a-Judge evaluation implementations that are now part of YESciEval
- [ ] Identify evaluation scripts that should remain (e.g., orchestration, report-level execution)
- [ ] Refactor remaining evaluation scripts to:
  - Import evaluation **rubrics, scoring logic, and metrics** from the YESciEval library
  - Avoid re-implementing judge prompts, criteria, or aggregation logic locally
- [ ] Ensure that:
  - Evaluation is still supported **per individual report**
  - Evaluation can still compute **aggregated scores over a collection of reports**

### Acceptance Criteria
- No duplicated LLM-as-a-Judge logic remains in this repository
- Evaluation scripts cleanly depend on YESciEval as an external library
- Report-level and collection-level evaluation workflows remain functional
- Codebase is easier to maintain and aligned with the shared evaluation framework

### Rationale
YESciEval is the canonical location for evaluation rubrics and LLM-as-a-Judge logic. This cleanup improves maintainability, avoids divergence, and strengthens reuse across projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Deep-Research Evaluation to Use YESciEval (Remove Legacy LLM-as-a-Judge Code) #18

Description

Tasks

Acceptance Criteria

Rationale

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor Deep-Research Evaluation to Use YESciEval (Remove Legacy LLM-as-a-Judge Code) #18

Description

Description

Tasks

Acceptance Criteria

Rationale

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions