Back homeMANIFESTO · 2026

Interpretability should feel like video games, not archaeology.

Neuronpedia gave the world its first SAE encyclopedia — a massive, essential contribution. Gemma Scope followed with an industrial-grade suite of dictionaries. The Anthropic Transformer Circuits thread wrote the intellectual foundation. We stand on their shoulders.

But an encyclopedia is where you look things up. It is not where discovery happens, and it is not where a student becomes a contributor. Today, running your first end-to-end interpretability experiment — find a feature, validate it, steer it, publish it — takes a PhD advisor, a GPU cluster, and three months. That is a failure mode for the field.

The five gaps

The gaps that no current tool fills:

Narrative / trace
Features shown in isolation, never the full journey of a prompt. A model's thought is a sequence, not a dictionary entry.
Comparison
"Why did model A answer X but model B answer Y?" is the question that matters in a reasoning-model world. No tool diffs activations side-by-side.
Circuits as UI
Every beautiful circuit figure in the 2024–2026 literature was hand-drawn in Figma. Circuits live in papers, not tools.
Onboarding
UX assumes PhD-level familiarity. Students bounce in 90 seconds. The field grows slower than the problem.
Failure archaeology
"My model hallucinated, which features fired?" Today: write a notebook. Tomorrow: upload a failure dataset, get a ranked feature list back automatically.

The four pillars

OpenInterp is built as four complementary platforms — the minimum set that fills all five gaps and also finances itself:

Observatory is the microscope — watch the model. Laboratory is the workbench — edit the model. Watchtower is the gantry — monitor the model. Academy is the school — teach the world to use the other three. One platform. Four ways in.

Three structural bets

01

Cross-model feature graph

A feature-equivalence graph across Qwen, Gemma, Llama, Claude, Mistral. "Feature 2503 in Qwen ≈ feature 8901 in Gemma" — rendered, searchable, citable. Grows with every community-submitted SAE; the dataset gets more useful the more people contribute.

Q2 2026: first rendering with 2 models. Q4: 5+ models.
02

Revenue that funds the free tier

A paid monitoring API (Watchtower) is designed to subsidize the OSS tier long-term — so students, researchers, and contributors never hit a paywall. Paid where it sustains (safety teams, compliance, vendor integration); free where it matters.

Target: first design partner Q3, first revenue Q4. Not yet proven.
03

Model partnerships

Working with model vendors and research labs to ship SAEs alongside model releases. When an open-source SAE lands on the same day as the model, interpretability becomes part of the release process rather than an afterthought — a better default for everyone.

First partnership conversation active. Nothing signed yet.

What we uniquely bring

Every claim below is grounded in a shipped, Apache-2.0 licensed, public artifact:

  • First public SAE on the Qwen3.6 familydense (27B) and triple-hybrid MoE (35B-A3B). Verified against HuggingFace at shipping time; zero competitors.
  • Hybrid architecture expertisefirst SAEs on Gated Delta Networks (Qwen3.5-4B), ensemble MoE (Gemma-4 E4B), and triple-hybrid (Qwen3.6-35B-A3B). Landscape was previously uninterpretable.
  • mechreward — features as RL rewards+19 pp on GSM8K (Qwen3.5-4B) in 168 effective training steps via per-token SAE-sparse rewards. ρ=0.52 cross-architecture on SuperGPQA. pip install mechreward.
  • Stage Gate protocolcorrelation pre-test (G1) → three-way ablation (G2) → ceiling-breaking full RL (G3). Don't burn GPU hours until the signal predicts the outcome.
  • Honest negativeswe published the feature-circuits result that failed replication. Trust comes from admitting what broke.

The first-minute experience

A student in Mumbai, on a phone, at 2:00 am, in two minutes — discovers a hallucination feature in GPT-5 that nobody has seen. Publishes a mini-paper embedded in the platform. Has three DeepMind researchers commenting in real time before breakfast.

That is the north star. Everything — the hero animation, the mobile-first layout, the zero-login Trace Theater, the shareable trace URLs, the Expedition that validates your work in 15 minutes instead of 15 weeks — is optimized toward that one scene.

Neuronpedia is a tab you consult. OpenInterp is a tab you leave open.

How to get involved

Manifesto last revised 2026-04-23. Build in public. Amend in public.