TRAIN · THE LADDER

Any model. Any scale.

Train your first sparse autoencoder in 30 minutes on a free Colab T4. Train a hybrid-architecture SAE in 4 hours on free Kaggle. Train paper-grade on cloud. One ladder, zero gatekeeping.

Start with Tier 1 All notebooks on GitHub

beginner

TIER 1 · HOBBYIST

Your first SAE in 30 minutes

PlatformGoogle Colab · Free T4

Cost$0

ModelGemma-2-2B

Tokens50 M

Dictionary7× (n=16k)

Time30–40 min

Train a complete TopK SAE with AuxK dead-feature mitigation on Gemma-2-2B. Drive-based checkpoint recovery handles Colab's 90-minute idle disconnect. Ends with your own SAE uploaded to HuggingFace — citable, reusable, shareable.

What you'll learn

Forward hooks + residual stream extraction
TopK activation + AuxK auxiliary loss (Gao et al. 2024)
Geometric-median b_dec initialization
HuggingFace safetensors + cfg.json format
Crash-safe checkpointing to Google Drive

Prerequisites

→Google account (Colab Free access)
→HuggingFace account + HF_TOKEN in Colab Secrets
→Edit one line: HF_USERNAME

Open in Colab View on GitHub

intermediate

TIER 2 · EXPLORER

Hybrid-architecture SAE — Qwen3.5-4B

PlatformKaggle · 2× T4 (32 GB)

Cost$0 · 30 h/wk

ModelQwen3.5-4B

Tokens150 M

Dictionary16× (n=40k)

Time4–5 h

The first-public-ready SAE recipe for hybrid GDN architectures. Installs transformers from source for qwen3_5 support, uses output_hidden_states path (Qwen3.5 has no .layers), survives Kaggle kernel-kill via HF-resumable checkpoints. Produces a publishable SAE matching the Stage Gate 1 research bar.

What you'll learn

Hybrid GDN activation capture (output_hidden_states)
transformers-from-source install + restart dance
Dual-GPU model/SAE split (model on cuda:0, SAE on cuda:1)
HuggingFace streaming checkpoints for kernel-kill recovery
Held-out validation + val_report.json publishing

Prerequisites

→Completed Tier 1, or SAE experience
→Kaggle account + HF_TOKEN in Kaggle Secrets
→Basic understanding of Gated Delta Networks (links in notebook)

Run on Kaggle View on GitHub

advanced

TIER 3 · PAPER-GRADE

Paper-grade SAE — Qwen3.6-27B

PlatformVast.ai / Lambda · RTX 6000 Pro (96 GB)

Cost~$30–60 / run

ModelQwen3.6-27B

Tokens200 M

Dictionary13× (n=65k)

Time20–24 h

The Gemma-Scope-27B-parity recipe. 3 TopK SAEs trained in parallel on L11/L31/L55 with a single shared forward pass, 70/20/10 FineWeb-Edu + OpenThoughts + OpenMath corpus mix, and HF streaming checkpoints every 10M tokens so a crash costs at most 10 minutes. This is the notebook behind qwen36-27b-sae-papergrade.

What you'll learn

Multi-layer simultaneous SAE training (one forward pass, 3 SAEs)
Corpus mixing for reasoning-model SAEs
Streaming activation buffer pattern (never OOM)
AuxK calibration for large n (d_model/2 heuristic)
sae_lens / Neuronpedia-ready export

Prerequisites

→Completed Tier 1 + Tier 2, or production SAE experience
→Cloud GPU account (Vast.ai / Lambda / RunPod) with ≥96 GB VRAM
→HF_TOKEN env var on the cloud instance

View on GitHub

Beyond the ladder

39 more notebooks for every step of your SAE journey.

Post-train discovery, one-click steering, model coverage, research replication, safety monitoring — every notebook opens in Colab or Kaggle directly.

Closes the loop

You have an SAE. Now understand it, share it, edit it.

beginner

Discover your features

Auto-label your SAE with an LLM judge

You trained an SAE. Now what? This notebook streams activations, ranks features by interestingness, sends top-activating examples to Claude or GPT-4, and returns a feature_catalog.json with 1-sentence descriptions.

⏱ ~20 min · Colab T4

▸ Colab Free · ANTHROPIC_API_KEY or OPENAI_API_KEY

	Hobbyist	Explorer	Paper-grade
Platform	Colab Free T4	Kaggle 2× T4	Cloud RTX 6000 Pro
Cost	$0	$0 · 30 h/wk quota	~$30–60 per run
VRAM	15 GB	32 GB	96 GB
Model	Gemma-2-2B (2.6 B)	Qwen3.5-4B (4.0 B)	Qwen3.6-27B (27 B)
Architecture	Dense	Hybrid GDN	Dense (reasoning)
Dictionary	n=16k (7×)	n=40k (16×)	n=65k (13×)
TopK	k=64	k=128	k=128 + AuxK
Tokens	50 M	150 M	200 M
Time	30–40 min	4–5 h	20–24 h
What you get	First SAE	Hybrid-arch SAE	Paper-grade SAE

Any model. Any scale.

Your first SAE in 30 minutes

Hybrid-architecture SAE — Qwen3.5-4B

Paper-grade SAE — Qwen3.6-27B

39 more notebooks for every step of your SAE journey.

Closes the loop

Discover your features

Auto-interp at scale — paper-grade SAE

Auto-interp targeted — circuit features

Build a shareable Trace

Steer your model

Reduce friction

Pick your tier

More models

Llama-3.1-8B SAE

Mistral-7B SAE

Phi-3-mini SAE

Research-grade

Stage Gate G1 — correlation pre-test

BatchTopK vs TopK

Circuits

Attribution Patching (AtP*)

Sparse Feature Circuits (Marks 2024)

ACDC slow-mode via AutoCircuit

Sparse Feature Circuits — paper-grade 27B

Train a Sparse Crosscoder

Cross-model crosscoder + Pearson_CE

RL-diffing crosscoder — base vs mechreward

Leaderboard

InterpScore v0.0.1 — rank your SAE

InterpScore on the paper-grade 27B SAE

Lenses

Logit Lens — per-layer predictions

Tuned Lens — calibrated predictions

Probing

Linear Probe — the SAE baseline

CCS — Contrast Consistent Search

RepE reading vector (LAT)

Hallucination — detection & steering

Entity-recognition v0.0.1 — the failed first try

Ferrando 2024 replication on Qwen3.6-27B

Single-feature steering — the null result

Multi-feature steering — top-K (no controls)

Multi-feature steering with full controls

Paper baselines — Ferrando 2024 on Qwen3.6-27B

Sensitivity — refusal-only vs Ferrando labelling

Guards — product reproducers

FabricationGuard PoC v1 — single SAE feature

FabricationGuard v2 — linear probe (production)

ReasonGuard v0.1 — probe during thinking

ReasonGuard v0.2 — multi-bench combined training

Multi-Probe DPO POC — Qwen3.6-27B + FG + RG

Anti-Goodhart fresh-probe validation

Safety + production

Watchtower preview — monitor input prompts

Pick the tier that matches your compute.

Every tier uses the same research-grade recipe.

Stream activations

TopK SAE + AuxK

Resume-safe checkpoint

Cosine LR + warmup

Held-out validation

SAELens-compatible export

Your SAE is now an asset. Put it to work.

Stuck? Lost? Want your notebook added?