Probe resultQwen/Qwen3.6-27B-Instruct2026-05-11 · by caiovicentino

FabricationGuard v2 — L31 cross-task hallucination probe (Qwen3.6-27B)

Linear probe on layer 31 residual stream detects confident hallucinations across tasks. HaluEval within 0.90, SimpleQA cross-task 0.88. −88% confident-wrong reduction in SimpleQA. ~1ms sklearn p95 inference.

Numbers

auroc_within_haluval
0.900
auroc_cross_simpleqa
0.880
conf_wrong_reduction_simpleqa
-0.880
sklearn_p95_inference_ms
1
layer
31
position
end_question

Artifacts

probe.joblibscaler.joblibmetadata.json

These files live in the linked HF dataset. Open dataset →

Cite

Content-only sha256 below. Verifiable: re-hash the JSON manifest (with manifest_sha256 set to null, sort_keys=True) and you get the same digest. Zenodo DOI pending.

manifest_sha256
8d5df2d5d52521632e01d2e2617c6d6b328b85c756b68e4d2c70a19ba31be9fc
Atlas URL
https://openinterp.org/atlas/8d5df2d5d5
Raw manifest
https://raw.githubusercontent.com/OpenInterpretability/registry/main/atlas/2026/8d5df2d5d5.json

Reproduce this in your agent

In an agent session attached to your Colab via openinterp-mcp:

from openinterp_mcp.atlas import load_entry

entry = load_entry("8d5df2d5d5")
print(entry.methodology_check)

# Re-run the causality protocol against the linked HF artifact:
from openinterp_mcp.judge import reproduce
reproduce(entry, hf_repo_id="openinterp/fabricationguard-qwen36-27b-l31-v2")