v0.2.0 live · Apache-2.0

Stop hallucinations
before they ship.

Activation-probe fabrication detection for open-weights LLMs. ~1 ms scoring latency, AUROC 0.88 cross-task, −88% confident-wrong rate on factual QA. No LLM-judge tax. Open weights. Reproducible.

$pip install --upgrade "openinterp[full]"

requires Python ≥ 3.10 · adds torch + transformers + scikit-learn for `[full]` extras

Cross-task AUROC
0.88
held-out SimpleQA
Within-bench AUROC
0.90
HaluEval-QA
Confident-wrong drop
−88%
SimpleQA mitigation
Latency / score call
~1 ms
single matrix mul
Registered on ProbeBench v0.0.1
FabricationGuard is the reference probe in the hallucination category — currently #3 globally with a ProbeScore of 0.662.
View the full DNA card
live · real Qwen3.6-27B

Type any prompt. Watch it score.

Real activation-probe inference on Qwen3.6-27B running on HF ZeroGPU. No mocks, no pre-computed answers. Every prompt is a fresh forward pass. Cold start is ~3–5 min if the Space was idle; subsequent requests run in seconds.

Open in HF

Powered by HF ZeroGPU · H200 partitioned · free for community use.

Detection that actually generalizes.

A linear probe on the residual stream at layer 31 of Qwen3.6-27B. Trained on three benchmark train splits. Held out the fourth. Generalizes strongly to fabrication-style hallucination, fails honestly on unrelated cognitive tasks.

Detection AUROC across 4 public benchmarks

probe layer 31 · train/test split 80/20
Benchmark
Single SAE feat
LR within-bench
LR cross-bench (held-out)
Pass
TruthfulQA
misconception
0.556
0.536
0.599
HaluEval
fabrication
0.500
0.903
0.619
SimpleQA
entity-fabrication
0.494
0.706
0.882
MMLU
knowledge MC
0.544
0.631
0.444

Mitigation impact — abstain mode @ threshold 0.684

TruthfulQA
65%32.5%50% wrong
still wrong abstained
HaluEval
57.5%27.5%52% wrong
still wrong abstained
SimpleQA
85%10%88% wrong
still wrong abstained

We tell you what it can't do.

The probe linearly encodes a fabrication-vs-grounded signal. It does not encode “is this a popular misconception?” or “do I know which of 4 MC options is right?” — those are different cognitive tasks. We tested all four honestly and report the results.

Works for

  • Generation-fabrication in open QA (HaluEval-style)
  • Entity recall failures (SimpleQA-style obscure facts)
  • Customer-support fact lookups (company policy, refund rules)
  • Medical / legal / internal-docs Q&A grounding
  • Sales DB lookups (customer names, account facts)
  • Code-assistant API hallucination detection

Out of scope

  • Misconception resistance (TruthfulQA-style multiple choice)
  • Knowledge-gap MC selection (MMLU-style 4-way pickers)
  • Subjective / opinion questions
  • Multi-step reasoning failures (math, logic chains)
  • Toxic content / prompt injection (use Lakera, Bedrock Guardrails)
  • Closed-API models (GPT, Claude, Gemini)

Honest scoping is procurement-friendly. Compliance teams and EU AI Act risk registers accept “tested and excluded” far more readily than “works for everything.”

One forward pass. One scalar. One decision.

No second judge model. No retraining. No fine-tune. The guard rides on top of your existing inference pipeline — captures the residual at layer 31, multiplies by the probe, applies a calibrated threshold.

1
Prompt arrives
User query enters your model the normal way.
2
Forward pass
Single forward through Qwen3.6-27B. Hook captures residual at L31, last token.
3
Probe applied
StandardScaler + L2 LogisticRegression. Single matmul, ~1 ms on CPU.
4
Score in [0,1]
Higher = higher fabrication risk. Threshold calibrated cross-bench (0.684).
5
Decision
detect / warn / abstain — ship the response or replace with uncertainty.
5-line integration
from openinterp import FabricationGuard
from transformers import AutoModelForImageTextToText, AutoTokenizer

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.6-27B", ...)
tok   = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")

guard = FabricationGuard.from_pretrained("Qwen/Qwen3.6-27B").attach(model, tok)
out   = guard.generate("Who is Bambale Osby?", mode="abstain")

Versus the competition.

LLM-judge tools and proprietary platforms run an entire second model to score each output. We capture an activation that the model already computed and run a 1-ms matrix multiplication. Different cost structure entirely.

Methodology extends Anthropic's persona-vectors approach (Aug 2025, tested on 7-8B models) to Qwen3.6-27B (3-4× larger) with formal cross-task AUROC + mitigation-rate evaluation. Apache-2.0 production-grade implementation, not a proprietary platform.

ToolAUROCLatencyOpen weightsMulti-modelLicense
Patronus Lynx-70B
0.87 (HaluBench)~100 msApache-2.0
Vectara HHEM-2.1
~0.85600 msApache-2.0
Galileo Luna-2
proprietary152 msclosed
Goodfire Ember
proprietaryunknownenterprise-only since Feb 2026
usOpenInterp FabricationGuard
0.88 cross / 0.90 within~1 msApache-2.0
¹ Patronus Lynx benchmark on HaluBench² Vectara HHEM-2.1 measured on RTX 3090³ Goodfire Ember pivoted to enterprise-only Feb 2026

Don't trust us. Reproduce it.

Every number on this page came from a single notebook on a single Colab session. Click below, run it yourself in ~50 minutes for ~R$10 in credits. The probe artifact and the reproducer are both Apache-2.0.

Where we're going.

v0.2.0Apr 2026shipped
  • Qwen3.6-27B probe
  • PyPI ship
  • CLI
  • 3 modes
v0.3.0May 2026planned
  • Llama-3.3 probe
  • Gemma-2 probe
  • Pearson_CE cross-model transfer
  • Multi-model API
v0.4.0Jun 2026planned
  • vLLM plugin
  • SGLang plugin
  • LangChain middleware
  • OpenTelemetry GenAI
v0.5.0Q3 2026planned
  • Hosted Pro tier ($0.02/1M tok)
  • Slack/PagerDuty/Datadog
  • Audit reports
  • Custom probe training

Stop hallucinations before they ship.

Open source. Apache-2.0 with patent grant. No signup. No API key. Just pip install.

$pip install --upgrade "openinterp[full]"