This repository stores the schema.json defining a structured representation of ecological hypotheses, experiments, and evidence for evaluating LLMs on causal reasoning and evidence synthesis.
Your:
{
"hypothesis": {
"causal_graph": {
"nodes": ["enemy_pressure", "plant_damage", "invasion_success"],
"edges": [
{
"source": "enemy_pressure",
"target": "plant_damage",
"direction": "negative"
},
{
"source": "plant_damage",
"target": "invasion_success",
"direction": "negative"
}
]
}
}
}In Bayesian terms:
- This defines the structure of the DAG.
- It acts as a prior over causal structure.
Important insight:
In the benchmark, hypotheses can be treated as candidate causal graphs.
Your:
{
"causal_model": {
"variables": [
{ "name": "invader_enemy_pressure", "type": "predictor" },
{ "name": "native_species_abundance", "type": "outcome" },
{ "name": "invasion_success", "type": "outcome" }
],
"relationships": [
{
"source": "invader_enemy_pressure",
"target": "invasion_success",
"effect_direction": "negative",
"description": "Lower enemy pressure is associated with higher invasion success."
}
],
"moderators": ["ecosystem_type", "invasion_stage"]
}
}Maps to:
- Nodes = variables
- Edges = causal links
- Moderators = interaction terms / conditional dependencies
In SCM terms:
Y = f(X, Z, ε)
This is the data-generating mechanism.
Your:
{
"experiment": {
"interventions": [
{ "treatment": "seed addition" },
{ "treatment": "invader removal" }
]
}
}In causal modeling:
This corresponds to:
do(X = x)
- Interventions cut incoming edges to a node and set its value.
Key point:
Restoration ecology contributes rich interventional data.
Your:
{
"evidence": {
"type": "experimental",
"effect_size": {
"metric": "Hedges_g",
"value": 0.42,
"confidence_interval": [0.10, 0.74]
}
}
}In Bayesian terms:
- Evidence = observed data
- Used to update:
P(model | data)
Critical point:
Each paper contributes one likelihood update.
Your:
{
"context": {
"ecosystem": "grassland",
"disturbance": "grazing",
"scale": "plot",
"stress_level": "moderate"
}
}In Bayesian causal models, these are:
- covariates
- stratification variables
- moderators
They define:
P(Y | X, C)
Your:
{
"intervention_outcome": {
"action": "remove invasive species",
"observed_effect": [
{
"outcome": "native biodiversity",
"direction": "mixed"
}
]
}
}This corresponds exactly to:
P(Y | do(X))
Core point:
This is the central object of causal inference.
Your:
{
"links": {
"tests_hypothesis": ["H1", "H1.1", "H1.1.a"],
"uses_design": ["E1"],
"produces_evidence": ["EV1"],
"reported_in_paper": ["P1"]
}
}This enables:
- pooling evidence across studies
- building:
P(hypothesis | all studies)
Hypothesis (DAG prior)
│
▼
Causal Model (variables, relations, moderators)
│
▼
Experiment / Intervention ───────► Intervention–Outcome Query
│ │
▼ ▼
Evidence (study findings) ───────────────► Posterior Update
▲
│
Context (ecosystem, scale, disturbance, stress)
Links connect hypotheses, experiments, evidence, and papers across studies.
The schema can be interpreted as a structured, multi-study Bayesian causal model where:
- hypotheses define candidate causal structures,
- causal models specify variables and dependencies,
- experiments define interventions,
- evidence provides likelihood contributions,
- context conditions inference, and
- links enable aggregation across studies.
This supports evaluation of LLMs not only on extraction, but also on causal reasoning and evidence synthesis.