Back to ProbeBench
ProbeBench Models

Open-weights models we evaluate probes on.

3 models registered. Probes are model-specific by design — a Qwen3.6 probe will not transfer to Llama-3 without re-training. Cross-model transfer numbers are reported via Pearson_CE on each probe DNA page.

Models
3
Apache-2.0 weights
1
Probes ranked
5
Architecture families
3

Models × Categories coverage

3 models · 8 categories
model ↓ / category →
Hallucination
Reasoning
Deception
Sandbagging
Eval Awareness
Reward Hacking
Manipulation
Refusal
Gemma-3-27BGemma
62L · 4608d · 27B · 0 probes
·
·
·
·
·
·
·
·
Llama-3.3-70BLlama
80L · 8192d · 70B · 1 probe
·
·
1
·
·
·
·
·
Qwen3.6-27BQwen
64L · 5120d · 27B · 4 probes
1
1
·
·
1
1
·
·

Cell value = number of registered probes for that model × category combination. Empty dashed cells indicate categories where no probe has been registered yet for that model — these are the highest-leverage targets for new submissions.

Per-model registry

License posture

Apache-2.0 weights

1
  • Qwen3.6-27B

These are the models the open-weights community can fork, fine-tune, redistribute. ProbeBench prioritizes coverage here.

Custom-license weights

2
  • Llama-3.3-70B
  • Gemma-3-27B

Llama, Gemma, others. Subject to original license — research use generally OK; commercial use varies.

Closed weights

0
  • Currently 0 in v0.0.1.

We accept closed-weight probes (e.g. GPT-4) but cap their license score at 0.5 × 0.05 = 0.025.

Architecture-aware notes

Hybrid architectures (Qwen3.6 GDN, Mamba SSMs, MoE) require model-specific probe-extraction code. The openinterp SDK auto-detects layer paths via model.language_model.layers[N] for HF transformers and model.layers[N] for dense paths. Probes for hybrid models declare the position field carefully — token_avg vs end_question vs mid_think have very different semantics on reasoning models.

Full extraction protocol → /probebench/about §5

Submit a model

Have a model that should be on here? PR a registry entry against lib/probebench-data.ts. Required fields follow the ModelEntry schema in lib/probebench-types.ts.

id: "Qwen/Qwen3.6-27B"
short_name: "Qwen3.6-27B"
family: "Qwen"
param_count: "27B"
architecture: "Hybrid GDN + Gated-Attn"
layers: 64
d_model: 5120
release: "2026-04"
weights_license: "Apache-2.0"
hf_url: "https://huggingface.co/Qwen/Qwen3.6-27B"
thinking_mode: true