AutoDiscovery

Automated Scientific Discovery via Bayesian Surprise & Monte Carlo Tree Search — NLS Socioeconomic Status Dataset

Model gpt-4o

Experiments 4 rounds × 8 candidates

Dataset nls_ses (DiscoveryBench)

Belief Mode boolean_cat

Reward KL Divergence

MCTS Selection UCB1 (recursive)

Belief Samples 30

Run Time 2026-02-21 12:20

Try Your Own Dataset

Upload a CSV file to preview its structure, then run AutoDiscovery on it for free.

⬆

Drag & drop a CSV file here, or click to browse
Plain CSV only · Parsed locally in your browser · Nothing uploaded

Column Descriptions (optional)

Briefly describe what each column represents. This helps the AI generate better hypotheses.

▶ Run on AstaLabs (Free) Upload your dataset there to run AutoDiscovery.
Free credits available until Feb 28, 2026.

⬇ Demo Results (NLS SES Dataset) ⬇

Exploration Summary

Aggregate statistics from the MCTS exploration run.

Hypotheses Tested

Surprising Discoveries

2.20

Mean KL Divergence

0.440

Mean Reward

Discovery Feed

Hypotheses ranked by self-reward score. Each card shows the experimental finding, belief update, and information-theoretic metrics.

Socioeconomic status (SES) is a significant predictor of the composite ASVAB score ability among respondents of the NLS dataset.

Surprise ✗ Expected

Belief 0.742 → 0.907

Δ Belief +0.165

KL Divergence 3.591

Reward

0.718

Black respondents have significantly different class percentiles compared to their White and Hispanic counterparts.

Surprise ✗ Expected

Belief 0.726 → 0.890

Δ Belief +0.164

KL Divergence 3.258

Reward

0.652

The effect of race on ASVAB-derived abilities is mediated through socioeconomic status, indicating an indirect effect of race on ability through SES.

Surprise ✗ Expected

Belief 0.734 → 0.838

Δ Belief +0.104

KL Divergence 1.192

Reward

0.238

There is a significant difference in ability scores between males and females within each racial group.

Surprise ✗ Expected

Belief 0.629 → 0.538

Δ Belief −0.091

KL Divergence 0.759

Reward

0.152

Belief Distribution

Prior vs. posterior mean belief (P(H|data)) for each tested hypothesis, illustrating how evidence shifted confidence.

Information Gain

KL Divergence — bits of information gained from each experiment, measuring how much the posterior diverged from the prior.

Belief Category Breakdown

Detailed prior and posterior belief sample distributions across the five confidence categories.

Methodology

AutoDiscovery is an AI-driven scientific discovery framework developed by the Allen Institute for AI (AI2). It uses a Monte Carlo Tree Search (MCTS) strategy to explore a combinatorial space of hypotheses and experiments, selecting the most informative path at each step via the UCB1 bandit algorithm.

At each node, a large language model proposes a hypothesis and designs an experiment (including executable code). After execution, a separate Bayesian Belief Update module evaluates the result: an LLM jury samples categorical beliefs ("definitely true" → "definitely false") both before (prior) and after (posterior) seeing the experimental evidence, using n = 30 independent samples per distribution.

The reward signal is derived from the KL Divergence between the prior and posterior belief distributions, scaled by a configurable factor (here κ = 5). Larger divergence means the experiment was more informative — it changed the model's beliefs substantially. A "surprise" flag (not triggered in this run) would indicate findings that contradicted the prior expectation.

The tree is grown iteratively: each round selects a parent node, generates candidate experiments, runs the most promising one, updates beliefs, and back-propagates the reward. This demo ran 4 MCTS iterations with 8 candidate experiments per round, using GPT-4o as both the hypothesis-generating and belief-evaluating model.