FindingQwen/Qwen3.6-27B-Instruct2026-05-11 · by caiovicentino

Probe-detected grokking in multi-probe DPO (Qwen3.6-27B nb37 v2)

Phase transition (ratio 2.596) in fresh-probe AUROC across 11 nb37 v2 checkpoints. Original FG/RG probes show ZERO effect — DPO learning orthogonal to task-probe axes. Construct-then-compress pattern.

Numbers

phase_transition_ratio
2.596
fresh_probe_auroc_pre
0.472
fresh_probe_auroc_post
0.528
original_probe_effect
0
checkpoints
11

Artifacts

nb37_v2_checkpointsnb41_v2_grokking_extended

These files live in the linked HF dataset. Open dataset →

Cite

Content-only sha256 below. Verifiable: re-hash the JSON manifest (with manifest_sha256 set to null, sort_keys=True) and you get the same digest. Zenodo DOI pending.

manifest_sha256
7019cff91255b679077964591a24794705ec2b20bb58374d2f265af010ca886c
Atlas URL
https://openinterp.org/atlas/7019cff912
Raw manifest
https://raw.githubusercontent.com/OpenInterpretability/registry/main/atlas/2026/7019cff912.json

Reproduce this in your agent

In an agent session attached to your Colab via openinterp-mcp:

from openinterp_mcp.atlas import load_entry

entry = load_entry("7019cff912")
print(entry.methodology_check)

# Re-run the causality protocol against the linked HF artifact:
from openinterp_mcp.judge import reproduce
reproduce(entry, hf_repo_id="caiovicentino1/openinterp-37v2-multiprobe-dpo-extended")