Fabrication detection for clinical LLM deployments
The 2026 medical AI category is moving fast — CZI Biohub committed $500M to AI biology on Apr 29, 2026. EU AI Act Article 14 takes effect August 2026. The bottleneck is not modeling — it is knowing when the model is making things up. FabricationGuard is one open standard for that signal: ~1 ms scoring latency, AUROC 0.88 cross-task, Apache-2.0.
Why this category is urgent in 2026
What FabricationGuard does (in 60 seconds)
A linear logistic regression on the residual stream at layer 31 of Qwen3.6-27B, scored at the end of the user question. Trained cross-bench on TruthfulQA + HaluEval + MMLU + SimpleQA. Captures the "model is fabricating" signal that lives in the residual stream before generation begins — abstention happens at zero token cost.
Mitigation analysis (notebook 31): in abstain mode at threshold 0.684, confidently-wrong response rate drops −88% on SimpleQA, −52% on HaluEval, −50% on TruthfulQA. MMLU is "capability control" — out of scope by design (see honest scope below).
Where FabricationGuard fits in clinical workflows
Clinical Q&A grounding
Patient-facing or clinician-facing chat: "What are the symptoms of long QT syndrome?" — model can fabricate a plausible-sounding but wrong list. FabricationGuard scores residual at end-of-question, abstains when fabrication probability > threshold.
Drug-interaction lookup
Model summarizes drug interactions for clinical reference. Rare interactions = long-tail factoids = fabrication-prone. Probe surfaces when the model is generating plausibly without grounding.
Evidence-grounded clinical summarization
Model summarizes a patient chart, study abstract, or guideline document. Fabrication happens when model adds claims absent from source. Probe captures this even before the model finishes generating.
Biological / pharmaceutical reasoning
AI suggesting protein-target interactions, drug repurposing candidates, or disease pathways. Model fabricates relationships not in training data — clinically critical to flag.
Honest scope — what is not validated yet
We do not currently have medical-domain test sets in the registry. The probe is trained on general open-domain factual QA. Performance on medical-specific benchmarks (PubMedQA, MedQA, MIMIC-IV summarization) is not yet measured. Anyone deploying this in a clinical setting must:
- Recalibrate threshold on a held-out medical test set with annotated halu/grounded labels (we recommend 200+ samples).
- Estimate domain-shift AUROC— out-of-distribution AUROC for medical context likely lower than the 0.88 cross-task headline.
- Pair with retrieval grounding when ground truth exists (RAG, drug databases, guideline DB). FabricationGuard handles closed-book; HaluGate-style NLI handles grounded. The two are complementary.
- Human-in-the-loop above any abstention threshold. Probe-based abstention is a filter, not a replacement for clinical review — especially in the EU AI Act Article 14 sense.
If your team has annotated medical hallucination data and wants to run the probe to produce domain-specific numbers, we will publish those evaluations on ProbeBench (with citation to your team) at no cost.
Regulatory framework (why probe-based abstention fits)
High-risk AI systems (including medical AI) require human oversight. Activation-probe abstention fits as the technical mechanism for "intervention by the operator" — model declines when uncertain, human reviews.
Software as Medical Device guidance increasingly references uncertainty quantification. Probe-based abstention provides per-query uncertainty signal more interpretable than ensemble or temperature-scaling baselines.
Recommends transparency about model uncertainty. Activation probes provide an audit trail — every output has a confidence number that maps to a specific layer/position decision.
Pilot framework — for medical-domain partners
We are looking for 2-3 partners (clinical research orgs, medical AI startups, hospital systems with ML teams) to run a 30-day pilot. Outcome: domain-validated AUROC numbers on your test set, published on ProbeBench with citation.
What we provide
- ✓ FabricationGuard probe (Apache-2.0) integrated into your inference pipeline
- ✓ Recalibration on your held-out test set (200-500 examples)
- ✓ Threshold tuning by your asymmetric cost model (false-confidence vs over-abstention)
- ✓ ECE / FPR@99TPR / AUROC with bootstrap CIs
- ✓ Cross-model Pearson_CE if you use a non-Qwen model
- ✓ Public publication of results on ProbeBench (or private if NDA)
What we need from you
- ✓ 200-500 medical Q-A pairs with annotated hallucination labels
- ✓ Description of clinical context (specialty, query distribution, deployment surface)
- ✓ Asymmetric-cost specification (e.g., "over-abstention 5×, false-confidence 50×")
- ✓ Engineering counterpart for ~2 weeks integration support
- ✓ Permission to publish anonymized aggregate results (or NDA + private deliverable)
Pilot outcomes — what gets published
At end of pilot we co-author a post on openinterp.org/blog with the domain-specific numbers. Your team is named. Your task gets registered as a new entry in /probebench/tasks with SHA-256-hashed test set. Probe variant tuned to your domain becomes a registered probe alongside FabricationGuard v2. Your customers see the validation.
Medical AI roadmap on ProbeBench
- v0.1 (today)FabricationGuard general-domain — applicable to medical with caveatLinear probe trained on TruthfulQA/HaluEval/MMLU/SimpleQA. Cross-domain AUROC unmeasured for medical. Use with explicit recalibration.
- v0.2 (Q3 2026)Medical-domain test sets registeredPubMedQA, MedQA-USMLE, MIMIC-IV-derived summarization halu test set added to /probebench/tasks. SHA-256 pinned.
- v0.3 (Q4 2026)MedicalFabricationGuard — domain-specialized probeProbe re-trained on medical Q-A corpus with halu labels from clinician annotations. Released as separate registry entry alongside general-domain v2.
- v1.0 (2027)Drug-interaction probe + clinical-grounding NLI ensembleSpecialized probe for pharmaceutical fabrications + post-hoc grounding against pharmacovigilance DB. Combined ProbeScore reported.
Pilot inquiry
Email with a 1-paragraph description of your medical AI deployment + test-set availability.
caio@openinterp.orgSelf-serve integration
Already deployed Qwen3.6-27B and want to drop in FabricationGuard?
pip install openinterp
from openinterp import FabricationGuard
guard = FabricationGuard.from_pretrained("Qwen/Qwen3.6-27B")
output = guard.generate(query, mode="abstain", threshold=0.5)ProbeBench v0.0.1 · Apache-2.0 · OpenInterp · 2026