Feature packs

Catalog

Each pack is a validated set of helpful + harmful SAE features discovered via contrastive correctness analysis. To appear here, a pack must pass Stage Gate 1 (Spearman ρ ≥ 0.30 on held-out data).

qwen3.5-4b/reasoning_pack

Qwen/Qwen3.5-4B · GSM8K (math reasoning)

Validated

Spearman ρ

0.540

Pearson r

0.726

n (held-out)

100

Features

10 helpful + 10 harmful

Cohen's d range: [+2.06, +2.16] / [−2.47, −2.06]

Discovered on: 50 GSM8K responses (raw Q/A)

reasoning_pack.json SAE on HuggingFace

qwen3.6-35b-a3b/reasoning_pack

Qwen/Qwen3.6-35B-A3B · SuperGPQA (science/engineering)

Validated

Spearman ρ

0.522

Pearson r

0.537

n (held-out)

100

Features

10 helpful + 10 harmful

Cohen's d range: [+1.72, +1.84] / [−1.73, −1.36]

Discovered on: 50 SuperGPQA responses (thinking mode)

reasoning_pack.json SAE on HuggingFace

gemma-4-e4b/reasoning_pack

Google/Gemma-4-E4B · GSM8K (pending)

Pending G1

Spearman ρ

—

Pearson r

—

n (held-out)

—

Features

—

Cohen's d range: pending contrastive discovery

Discovered on: —

reasoning_pack.json SAE on HuggingFace

Contribute a pack

If you train a SAE on a new model + architecture and run Stage Gate 1 on a labeled benchmark, open a PR to the catalogs/ directory. Packs that meet the ρ ≥ 0.30 threshold on an independent held-out set will be merged and appear here. See the pack template for required fields.