Probe resultQwen/Qwen3.6-27B-Instruct2026-05-11 · by caiovicentino

ReasonGuard v0.2 — L55 mid_think CoT faithfulness probe (Qwen3.6-27B)

Position-of-faithfulness probe at L55 mid_think token. AUROC 0.888 within GSM8K, 0.605 cross StrategyQA. Honest narrow-scope finding — domain-bound, not universal.

Numbers

auroc_within_gsm8k
0.888
auroc_cross_strategyqa
0.605
layer
55
position
mid_think
scope
domain-bound

Artifacts

probe.joblibscaler.joblibmetadata.json

These files live in the linked HF dataset. Open dataset →

Cite

Content-only sha256 below. Verifiable: re-hash the JSON manifest (with manifest_sha256 set to null, sort_keys=True) and you get the same digest. Zenodo DOI pending.

manifest_sha256
49eba51edb65b6ee13cfa4363cc4a0939ca704df5f25111b9e066836c9b2b890
Atlas URL
https://openinterp.org/atlas/49eba51edb
Raw manifest
https://raw.githubusercontent.com/OpenInterpretability/registry/main/atlas/2026/49eba51edb.json

Reproduce this in your agent

In an agent session attached to your Colab via openinterp-mcp:

from openinterp_mcp.atlas import load_entry

entry = load_entry("49eba51edb")
print(entry.methodology_check)

# Re-run the causality protocol against the linked HF artifact:
from openinterp_mcp.judge import reproduce
reproduce(entry, hf_repo_id="openinterp/reasonguard-qwen36-27b-l55-mid_think")