Probe resultepiphenomenal-softmaxQwen/Qwen3.6-27B-Instruct2026-05-10 · by caiovicentino
agent-probe-guard v0.1 — L43 pre_tool detection probe (epiphenomenal-softmax under steering)
Detection-tier probe for tool-call success in SWE-bench traces. AUROC 0.83 at N=42 with random-feature baseline gap +0.27. Causality protocol verdict is epiphenomenal-softmax: probe DETECTS but cannot LEVER (paper-6 Phase 7 finding).
Numbers
auroc
0.830
n_samples
42
gap_over_random_baseline_pca10
0.269
control_token_normalized_delta_rel
-0.046
behavioral_flips_alpha_5
0
sklearn_p95_inference_latency_ms
0.190
verdict_class
epiphenomenal-softmax
Methodology check
Output of the causality_protocol primitive when it was run on this artifact. See paper-6 for the 3-baseline methodology and the 5-class verdict spec.
- verdict
- epiphenomenal-softmax
- real_auroc
- 0.83
- random_baseline_mean
- 0.561
- auroc_gap
- 0.269
- delta_rel_max
- -0.046
- flip_rate_at_max_alpha
- 0
- baselines_run
- random_direction_random_acts
- caveat
- v0 protocol — predates the three-baseline upgrade in openinterp-mcp v0.0.3
Artifacts
probe.joblibmanifest.jsonREADME.md
These files live in the linked HF dataset. Open dataset →
Cite
Content-only sha256 below. Verifiable: re-hash the JSON manifest (with manifest_sha256 set to null, sort_keys=True) and you get the same digest. Zenodo DOI pending.
manifest_sha256
9f2e9c5b8e4fbb7c7eb4c9290d927e76341006c9d63277da05f4d21a0ab26c9bAtlas URL
https://openinterp.org/atlas/9f2e9c5b8eRaw manifest
https://raw.githubusercontent.com/OpenInterpretability/registry/main/atlas/2026/9f2e9c5b8e.jsonReproduce this in your agent
In an agent session attached to your Colab via openinterp-mcp:
from openinterp_mcp.atlas import load_entry
entry = load_entry("9f2e9c5b8e")
print(entry.methodology_check)
# Re-run the causality protocol against the linked HF artifact:
from openinterp_mcp.judge import reproduce
reproduce(entry, hf_repo_id="caiovicentino1/agent-probe-guard-qwen36-27b")