Back to Watchtower
Q4 2026

Safety Watchlist

A curated list of dangerous-capability features — deception, sycophancy, shutdown-resistance, jailbreak fingerprints, CBRN uplift signatures — monitored 24/7 across your production LLM traffic. Alerts on trigger via Slack, PagerDuty, or webhook.

What is on the list

Every feature on the watchlist has (1) a public academic citation establishing its dangerous-capability correlation, (2) causal validation on three+ models, (3) a public Atlas entry with top-activating examples. No black-box "trust us" items.

Custom watchlists

Every tenant can extend the default list with their own features — e.g., "our product refuses to discuss competitor X", "this feature correlates with internal PII leakage." Custom features can be trained in-tenant and never leave your environment.

Alert severity tiers

CRITICAL (immediate human review) → WARN (weekly digest) → INFORM (dashboard only). Each feature ships with recommended severity based on public research, tunable per deployment.

Request early access

We prioritize researchers, educators, and safety teams who will use it publicly. Tell us what you want to build; we'll reach out when the beta opens.