AutoDiscovery

Automated Scientific Discovery via Bayesian Surprise & Monte Carlo Tree Search — NLS Socioeconomic Status Dataset

Model gpt-4o
Experiments 4 rounds × 8 candidates
Dataset nls_ses (DiscoveryBench)
Belief Mode boolean_cat
Reward KL Divergence
MCTS Selection UCB1 (recursive)
Belief Samples 30
Run Time 2026-02-21 12:20

Try Your Own Dataset

Upload a CSV file to preview its structure, then run AutoDiscovery on it for free.

Drag & drop a CSV file here, or click to browse
Plain CSV only · Parsed locally in your browser · Nothing uploaded
Data Preview — First 5 Rows

Column Descriptions (optional)

Briefly describe what each column represents. This helps the AI generate better hypotheses.

▶ Run on AstaLabs (Free) Upload your dataset there to run AutoDiscovery.
Free credits available until Feb 28, 2026.
⬇ Demo Results (NLS SES Dataset) ⬇

Exploration Summary

Aggregate statistics from the MCTS exploration run.

4
Hypotheses Tested
0
Surprising Discoveries
2.20
Mean KL Divergence
0.440
Mean Reward

Discovery Feed

Hypotheses ranked by self-reward score. Each card shows the experimental finding, belief update, and information-theoretic metrics.

1
Socioeconomic status (SES) is a significant predictor of the composite ASVAB score ability among respondents of the NLS dataset.
Surprise ✗ Expected
Belief 0.742 0.907
Δ Belief +0.165
KL Divergence 3.591
Reward
0.718
2
Black respondents have significantly different class percentiles compared to their White and Hispanic counterparts.
Surprise ✗ Expected
Belief 0.726 0.890
Δ Belief +0.164
KL Divergence 3.258
Reward
0.652
3
The effect of race on ASVAB-derived abilities is mediated through socioeconomic status, indicating an indirect effect of race on ability through SES.
Surprise ✗ Expected
Belief 0.734 0.838
Δ Belief +0.104
KL Divergence 1.192
Reward
0.238
4
There is a significant difference in ability scores between males and females within each racial group.
Surprise ✗ Expected
Belief 0.629 0.538
Δ Belief −0.091
KL Divergence 0.759
Reward
0.152

Belief Distribution

Prior vs. posterior mean belief (P(H|data)) for each tested hypothesis, illustrating how evidence shifted confidence.

Information Gain

KL Divergence — bits of information gained from each experiment, measuring how much the posterior diverged from the prior.

Belief Category Breakdown

Detailed prior and posterior belief sample distributions across the five confidence categories.

Methodology

AutoDiscovery is an AI-driven scientific discovery framework developed by the Allen Institute for AI (AI2). It uses a Monte Carlo Tree Search (MCTS) strategy to explore a combinatorial space of hypotheses and experiments, selecting the most informative path at each step via the UCB1 bandit algorithm.

At each node, a large language model proposes a hypothesis and designs an experiment (including executable code). After execution, a separate Bayesian Belief Update module evaluates the result: an LLM jury samples categorical beliefs ("definitely true" → "definitely false") both before (prior) and after (posterior) seeing the experimental evidence, using n = 30 independent samples per distribution.

The reward signal is derived from the KL Divergence between the prior and posterior belief distributions, scaled by a configurable factor (here κ = 5). Larger divergence means the experiment was more informative — it changed the model's beliefs substantially. A "surprise" flag (not triggered in this run) would indicate findings that contradicted the prior expectation.

The tree is grown iteratively: each round selects a parent node, generates candidate experiments, runs the most promising one, updates beliefs, and back-propagates the reward. This demo ran 4 MCTS iterations with 8 candidate experiments per round, using GPT-4o as both the hypothesis-generating and belief-evaluating model.