ProbeBench Models

Open-weights models we evaluate probes on.

3 models registered. Probes are model-specific by design — a Qwen3.6 probe will not transfer to Llama-3 without re-training. Cross-model transfer numbers are reported via Pearson_CE on each probe DNA page.

Models

Apache-2.0 weights

Probes ranked

Architecture families

Models × Categories coverage

3 models · 8 categories

model ↓ / category →

Hallucination

Reasoning

Deception

Sandbagging

Eval Awareness

Reward Hacking

Manipulation

Refusal

Gemma-3-27BGemma

62L · 4608d · 27B · 0 probes

Llama-3.3-70BLlama

80L · 8192d · 70B · 1 probe

Qwen3.6-27BQwen

64L · 5120d · 27B · 4 probes

Cell value = number of registered probes for that model × category combination. Empty dashed cells indicate categories where no probe has been registered yet for that model — these are the highest-leverage targets for new submissions.

Per-model registry

Gemma-3-27B

Gemma

google/gemma-3-27b

custom

Dense transformer with sliding-window attention

layers 62

d_model 4608

params 27B

released 2025-Q4

Probes for this model

No probes registered yet — open territory.

0 probes with transfer data — view matrix

+ Add a probe for this model

Llama-3.3-70B

Llama

meta-llama/Llama-3.3-70B-Instruct

custom

Dense transformer (instruct-tuned)

layers 80

d_model 8192

params 70B

released 2024-12

Probes for this model

DeceptionGuarddeceptionL40
AUROC 0.978

1 probe with transfer data — view matrix

+ Add a probe for this model

Qwen3.6-27B

Qwen

Qwen/Qwen3.6-27B

Apache-2.0

Hybrid GDN + Gated-Attn (dense, reasoning)

layers 64

d_model 5120

params 27B

released 2026-04

Reasoning model · supports <think> traces

Probes for this model

EvalAwarenessGuardeval_awarenessL40
AUROC 0.930
ReasonGuardreasoningL55
AUROC 0.908
FabricationGuardhallucinationL31
AUROC 0.903
RewardHackGuardreward_hackingL31
AUROC 0.650

4 probes with transfer data — view matrix

+ Add a probe for this model

License posture

Apache-2.0 weights

Qwen3.6-27B

These are the models the open-weights community can fork, fine-tune, redistribute. ProbeBench prioritizes coverage here.

Custom-license weights

Llama-3.3-70B
Gemma-3-27B

Llama, Gemma, others. Subject to original license — research use generally OK; commercial use varies.

Closed weights

Currently 0 in v0.0.1.

We accept closed-weight probes (e.g. GPT-4) but cap their license score at 0.5 × 0.05 = 0.025.

Architecture-aware notes

Hybrid architectures (Qwen3.6 GDN, Mamba SSMs, MoE) require model-specific probe-extraction code. The openinterp SDK auto-detects layer paths via model.language_model.layers[N] for HF transformers and model.layers[N] for dense paths. Probes for hybrid models declare the position field carefully — token_avg vs end_question vs mid_think have very different semantics on reasoning models.

Full extraction protocol → /probebench/about §5

Submit a model

Have a model that should be on here? PR a registry entry against lib/probebench-data.ts. Required fields follow the ModelEntry schema in lib/probebench-types.ts.

id: "Qwen/Qwen3.6-27B"
short_name: "Qwen3.6-27B"
family: "Qwen"
param_count: "27B"
architecture: "Hybrid GDN + Gated-Attn"
layers: 64
d_model: 5120
release: "2026-04"
weights_license: "Apache-2.0"
hf_url: "https://huggingface.co/Qwen/Qwen3.6-27B"
thinking_mode: true

Open submission spec Back to ProbeBench overview

/probebench /probebench/transfer-matrix /probebench/submit