Back to Laboratory
Q3 2026

Auto-Interp Engine

Upload your failure dataset. We run your model through an SAE, compute per-token feature activations on every prompt, correlate with your error labels, return the top 50 features most predictive of failure — with semantic descriptions and suggested interventions.

The grail of LLM debugging

Today, debugging an LLM regression is notebook archaeology: check logs, run ablations, read papers, get unlucky. Auto-Interp does the archaeology automatically and returns a ranked list you can act on. Works on any HuggingFace model we have an SAE for.

How it works

Given (prompts, labels), compute L0/L31 features per token, run a regularized logistic regression or MI score on each feature against the label. Filter by AUROC > 0.6, annotate with LLM-judge using top-activating examples, return.

Output

A notebook-ready JSON report plus a web-visualizable dashboard. Every feature linked to its Atlas entry. Suggested Sandbox recipes pre-generated for the top 5.

Request early access

We prioritize researchers, educators, and safety teams who will use it publicly. Tell us what you want to build; we'll reach out when the beta opens.