Frontier Model Evaluation for Scientific Reasoning

What is true?

We synthesize answers from every leading AI model on the hardest problems in physics. Decomposing claims. Surfacing convergence and disagreement. Showing our work.

The Last Question

One problem. Every frontier model. Full transparency.

A single hard physics problem — open or partially open — attacked by the synthesis protocol in real time. Five frontier models generate independent answers. Claims are decomposed into atomic, verifiable units. Convergence and divergence are made visible. The adversarial layer stress-tests every derivation.

This is not a benchmark. It is a live, evolving demonstration of what AI can and cannot do when confronted with problems that matter.

Synthesis Protocol v0

Initializing

Models

Claude · GPT · Gemini · DeepSeek · Qwen — one per family

Layers

Independent generation → Claim decomposition → Adversarial stress test

Output

Structured claims with provenance, dependency graphs, confidence scores

Validation

Closed problems with known answers — protocol must beat majority vote

First Problem

Coming Q2 2026

Reports

State of Physics AI

Quarterly synthesis report evaluating every frontier and leading open-weight model on graduate-level physics across six domains: mechanics, electromagnetism, quantum mechanics, thermodynamics, relativity, and statistical mechanics. Closed-problem validation with ground truth. Free to read.

State of Physics AI — Q1 2026

12–15 frontier and open-weight models. 30 closed problems across 6 physics domains. Synthesis protocol v0 vs. majority vote comparison. Which models reason about physics, and which merely pattern-match?

Q2 2026