Beyond steering
Sandbox asks: what happens if I change this feature? Counterfactual Studio asks: what happens if I change this token, and how did that propagate through features? The two are complementary tools for two different debugging questions.
'What if this token were X?' Surgical replay. Edit one token in the middle of a generation; see how features downstream shift; compare full trajectories. Feels like a React time-travel debugger but for the residual stream.
Sandbox asks: what happens if I change this feature? Counterfactual Studio asks: what happens if I change this token, and how did that propagate through features? The two are complementary tools for two different debugging questions.
Pick any token in an existing trace. Edit its value. We re-run the forward pass from that point (cached prefix), show you the new features, the new output tokens, and — critically — the feature-level diff vs original. Cached prefixes make this fast enough to explore interactively.
Chain-of-thought robustness studies, prompt-injection susceptibility mapping, "what is the minimum edit that flips the answer" questions. Q3 2026.
We prioritize researchers, educators, and safety teams who will use it publicly. Tell us what you want to build; we'll reach out when the beta opens.