Frontier Model Evaluation for Scientific Reasoning

What is true?

We synthesize answers from every leading AI model on the hardest problems in physics. Decomposing claims. Surfacing convergence and disagreement. Showing our work.


The Last Question · Isaac Asimov · 1956
"How may entropy be reversed?" Insufficient data for meaningful answer.

One problem. Every frontier model. Full transparency.

A single hard physics problem — open or partially open — attacked by the synthesis protocol in real time. Five frontier models generate independent answers. Claims are decomposed into atomic, verifiable units. Convergence and divergence are made visible. The adversarial layer stress-tests every derivation.

This is not a benchmark. It is a live, evolving demonstration of what AI can and cannot do when confronted with problems that matter.

Synthesis Protocol v0
Initializing
Models
Claude · GPT · Gemini · DeepSeek · Qwen — one per family
Layers
Independent generation → Claim decomposition → Adversarial stress test
Output
Structured claims with provenance, dependency graphs, confidence scores
Validation
Closed problems with known answers — protocol must beat majority vote
First Problem
Coming Q2 2026

State of Physics AI

Quarterly synthesis report evaluating every frontier and leading open-weight model on graduate-level physics across six domains: mechanics, electromagnetism, quantum mechanics, thermodynamics, relativity, and statistical mechanics. Closed-problem validation with ground truth. Free to read.

State of Physics AI — Q1 2026

12–15 frontier and open-weight models. 30 closed problems across 6 physics domains. Synthesis protocol v0 vs. majority vote comparison. Which models reason about physics, and which merely pattern-match?

Q2 2026