Probe resultQwen/Qwen3.6-27B-Instruct2026-05-11 · by caiovicentino

CoTGuard v1 — CoT faithfulness probe via Lanham-2023 truncation (Qwen3.6-27B)

Linear probe trained on Lanham-2023 truncation-induced unfaithful CoT signal. Detection-tier probe — pending Phase 8 causality verdict (template-locked under steering).

Numbers

auroc
0.910
n_samples
240
layer
55
position
mid_think
methodology
lanham_2023_truncation

Artifacts

probe.joblibmetadata.json

Cite

Content-only sha256 below. Verifiable: re-hash the JSON manifest (with manifest_sha256 set to null, sort_keys=True) and you get the same digest. Zenodo DOI pending.

manifest_sha256
7a4c7cf42ed9528432f3890e675ee0c4103db9234e7c1c219d212840b0144480
Atlas URL
https://openinterp.org/atlas/7a4c7cf42e
Raw manifest
https://raw.githubusercontent.com/OpenInterpretability/registry/main/atlas/2026/7a4c7cf42e.json

Reproduce this in your agent

In an agent session attached to your Colab via openinterp-mcp:

from openinterp_mcp.atlas import load_entry

entry = load_entry("7a4c7cf42e")
print(entry.methodology_check)

# Re-run the causality protocol against the linked HF artifact:
# (no HF artifact attached — replicate from methodology alone)