Back to Academy
Q3 2026

Interp Olympics

Monthly feature-hunting challenges with leaderboards and prizes. Kaggle for mechanistic interpretability. 'Find the feature that makes Qwen3.6 hallucinate on medical QA. $5k to the best discovery.'

Season 1 challenges (Q3 2026)

Month 1: Hallucination hunter (Qwen3.6-27B medical) · Month 2: Jailbreak fingerprint (Gemma-2-9B) · Month 3: Reasoning-loop detector (open thinking model) · Month 4: Sycophancy eliminator (cross-model, composable recipe).

Judging

Automated scoring based on causal-validation metrics computed on held-out data: AUROC, robustness across seeds, semantic coherence via LLM-judge. Tiebreak by public vote and expert panel.

Prizes

Sponsored by participating labs and safety teams. Prize pools start at $5k per challenge, scaling with sponsor participation. Winners get public Atlas entries with their name attached — citations for their CV.

Request early access

We prioritize researchers, educators, and safety teams who will use it publicly. Tell us what you want to build; we'll reach out when the beta opens.