Hypothesis Benchmark Schema

This repository stores the schema.json defining a structured representation of ecological hypotheses, experiments, and evidence for evaluating LLMs on causal reasoning and evidence synthesis.

Mapping to Bayesian Causal Modeling

(A) Hypothesis → model structure (DAG prior)

Your:

{
  "hypothesis": {
    "causal_graph": {
      "nodes": ["enemy_pressure", "plant_damage", "invasion_success"],
      "edges": [
        {
          "source": "enemy_pressure",
          "target": "plant_damage",
          "direction": "negative"
        },
        {
          "source": "plant_damage",
          "target": "invasion_success",
          "direction": "negative"
        }
      ]
    }
  }
}

In Bayesian terms:

This defines the structure of the DAG.
It acts as a prior over causal structure.

Important insight:
In the benchmark, hypotheses can be treated as candidate causal graphs.

(B) Causal model → variables + structural equations

Your:

{
  "causal_model": {
    "variables": [
      { "name": "invader_enemy_pressure", "type": "predictor" },
      { "name": "native_species_abundance", "type": "outcome" },
      { "name": "invasion_success", "type": "outcome" }
    ],
    "relationships": [
      {
        "source": "invader_enemy_pressure",
        "target": "invasion_success",
        "effect_direction": "negative",
        "description": "Lower enemy pressure is associated with higher invasion success."
      }
    ],
    "moderators": ["ecosystem_type", "invasion_stage"]
  }
}

Maps to:

Nodes = variables
Edges = causal links
Moderators = interaction terms / conditional dependencies

In SCM terms:

Y = f(X, Z, ε)

This is the data-generating mechanism.

(C) Experiment → interventions (do-operator)

Your:

{
  "experiment": {
    "interventions": [
      { "treatment": "seed addition" },
      { "treatment": "invader removal" }
    ]
  }
}

In causal modeling:

This corresponds to:

do(X = x)

Interventions cut incoming edges to a node and set its value.

Key point:
Restoration ecology contributes rich interventional data.

(D) Evidence → likelihood + posterior update

Your:

{
  "evidence": {
    "type": "experimental",
    "effect_size": {
      "metric": "Hedges_g",
      "value": 0.42,
      "confidence_interval": [0.10, 0.74]
    }
  }
}

In Bayesian terms:

Evidence = observed data
Used to update:

P(model | data)

Critical point:
Each paper contributes one likelihood update.

(E) Context → conditioning / covariates

Your:

{
  "context": {
    "ecosystem": "grassland",
    "disturbance": "grazing",
    "scale": "plot",
    "stress_level": "moderate"
  }
}

In Bayesian causal models, these are:

covariates
stratification variables
moderators

They define:

P(Y | X, C)

(F) Intervention–outcome → causal query

Your:

{
  "intervention_outcome": {
    "action": "remove invasive species",
    "observed_effect": [
      {
        "outcome": "native biodiversity",
        "direction": "mixed"
      }
    ]
  }
}

This corresponds exactly to:

P(Y | do(X))

Core point:
This is the central object of causal inference.

(G) Links → multi-study Bayesian updating

Your:

{
  "links": {
    "tests_hypothesis": ["H1", "H1.1", "H1.1.a"],
    "uses_design": ["E1"],
    "produces_evidence": ["EV1"],
    "reported_in_paper": ["P1"]
  }
}

This enables:

pooling evidence across studies
building:

P(hypothesis | all studies)

Small DAG-Style Diagram

Hypothesis (DAG prior)
        │
        ▼
Causal Model (variables, relations, moderators)
        │
        ▼
Experiment / Intervention  ───────►  Intervention–Outcome Query
        │                                      │
        ▼                                      ▼
Evidence (study findings)  ───────────────►  Posterior Update
        ▲
        │
Context (ecosystem, scale, disturbance, stress)

Links connect hypotheses, experiments, evidence, and papers across studies.

Summary

The schema can be interpreted as a structured, multi-study Bayesian causal model where:

hypotheses define candidate causal structures,
causal models specify variables and dependencies,
experiments define interventions,
evidence provides likelihood contributions,
context conditions inference, and
links enable aggregation across studies.

This supports evaluation of LLMs not only on extraction, but also on causal reasoning and evidence synthesis.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data-instance.json		data-instance.json
readme.md		readme.md
schema.json		schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hypothesis Benchmark Schema

Mapping to Bayesian Causal Modeling

(A) Hypothesis → model structure (DAG prior)

(B) Causal model → variables + structural equations

(C) Experiment → interventions (do-operator)

(D) Evidence → likelihood + posterior update

(E) Context → conditioning / covariates

(F) Intervention–outcome → causal query

(G) Links → multi-study Bayesian updating

Small DAG-Style Diagram

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Hypothesis Benchmark Schema

Mapping to Bayesian Causal Modeling

(A) Hypothesis → model structure (DAG prior)

(B) Causal model → variables + structural equations

(C) Experiment → interventions (do-operator)

(D) Evidence → likelihood + posterior update

(E) Context → conditioning / covariates

(F) Intervention–outcome → causal query

(G) Links → multi-study Bayesian updating

Small DAG-Style Diagram

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages