LIVE · Observatory · Q1 2026

Circuit Canvas

Figma-style attribution graphs for SAE features. Nodes = features. Edges = feature-to-feature attribution, scored by AtP* (Kramár et al. 2024). Pick a scenario below — each is a real circuit computed on our Qwen3.6-27B paper-grade SAEs.

PREVIEW · ATP · Qwen/Qwen3.6-27B · τ_node=0.05 · τ_edge=0.01 · 1 prompts

14 nodes3 edges

f64163L11

Medical terminology and clinical findings

|IE| 0.26

f55304L11

the" in titles, idioms, and fixed phrases

|IE| 0.23

f49740L11

polysemantic: leading digit of numbers / "the" after preposition

|IE| 0.23

f17728L11

medical differential diagnosis / alternative causes

|IE| 0.22

f2541L11

biological substances/structures implicated in diseases

|IE| 0.21

f12328L11

sentence/paragraph boundary after topic-ending punctuation

|IE| 0.21

△ errorL11

SAE reconstruction residual

|IE| 0.14

f37602L31

aorta and aortic anatomy/pathology

|IE| 1.00

f31426L31

the" following preposition in formal/academic prose

|IE| 0.59

f46598L31

heart and pericardium medical terminology

|IE| 0.55

f39620L31

Names of serious medical conditions/diseases

|IE| 0.53

f34368L31

unlabeled

|IE| 0.44

f15905L31

Medical terminology subword tokens

|IE| 0.40

△ errorL31

SAE reconstruction residual

|IE| 0.37

metric · logit['heart'] − logit['stomach']prompt · “Patient with sudden chest pain. The most likely cause involves the”

What you're seeing

Attribution graphs on Qwen3.6-27B paper-grade SAEs. Upstream = L11, downstream = L31. Triangle nodes are SAE reconstruction-error terms (Marks et al. 2024). Edge thickness = |attribution|; orange = positive, cyan = negative. Each scenario uses a task-specific contrastive logit metric.

How to build your own

Run 15b_sfc_qwen36_27b_papergrade.ipynb for the four-scenario SFC pipeline, or 14_attribution_patching.ipynb for node-only AtP*. Both emit circuit.json in the schema this viewer consumes.

Cite

Notebooks that emit this schema

Four ways to compute attribution graphs, fastest → slowest. All output the same circuit.json format this viewer renders.

14Attribution Patching

AtP* with QK fix + GradDrop (Kramár 2024). Node attribution only. Fastest — 2-3 forwards + 1 backward per prompt.

~15 min · T4

15Sparse Feature Circuits

Marks et al. 2024 replication. Node + edge attribution via AtP + IG-10 early-layer fallback. Emits full DAG.

~20 min · A100

15bSFC · Qwen3.6-27B paper-grade

Specialized SFC pipeline for caiovicentino1/qwen36-27b-sae-papergrade. Produces the 4 scenarios above — medical · IOI · math · refusal. One JSON per scenario.

~15 min × 4 · RTX 6000 Pro

16ACDC (slow-mode)

AutoCircuit library. Original NeurIPS 2023 algorithm. Independent verification. Slower than AtP but peer-reviewed.

~1-2 h · T4

17Crosscoder training

Lindsey et al. 2024. Train a shared-dictionary SAE across L11/L31/L55. Ties multi-layer features into one feature index.

~30 min-4h