Back home
CONTRIBUTING

Build this with us.

OpenInterpretability is built in public by a growing group of students, researchers, and safety teams. Every notebook, every SAE, every line of site code can have your name on it.

Four ways in — match your level

no experience to researcher

Zero-code

  • Read the manifesto
  • Open a Discussion with a question or idea
  • Share a Trace you found interesting on X tagging @openinterp
  • Star the repos that helped you

Student / first-timer

  • Train your first SAE in 30 min (notebook 01)
  • Run InterpScore on your SAE (notebook 18)
  • Submit your SAE to /interpscore (one-line PR)
  • Pick a "good first issue" label on any repo

Researcher

  • Port a published method to a Colab we host
  • Submit a Trace scenario for your domain (legal, bio, code)
  • Publish negative results — honest failures are welcome
  • Author an Expedition (Q3 2026)

Applied / safety team

  • Run Watchtower preview (notebook 13) on your traffic
  • Write an issue about a gap that blocks your deployment
  • Propose a case study we can include with your approval
  • Request Watchtower early access (Q4)

Pick a repo

4 repos · Apache-2.0 (code) · CC-BY 4.0 (docs)
TypeScript · Tailwind

OpenInterpretability/web

Next.js site at openinterp.org — Trace Theater, Circuit Canvas, InterpScore leaderboard.

Scope
  • Trace Theater scenarios (one PR = one scenario)
  • Leaderboard entries (one PR = one SAE)
  • UI / accessibility / mobile polish
  • New pillar sub-routes (Q2+)
Jupyter · PyTorch

OpenInterpretability/notebooks

23+ training & interpretability notebooks, from 30-min Colab hobbyist to paper-grade cloud.

Scope
  • Port a notebook to a new model (Llama, Mistral, Mamba, Phi, etc.)
  • Replicate a 2024–2026 interpretability paper
  • Add a platform (TPU, ROCm, MPS, Lambda)
  • Docker / reproducibility improvements
Python ≥ 3.10

OpenInterpretability/cli

The openinterp Python package — SDK + CLI (pip install openinterp).

Scope
  • New commands wrapping notebooks (score, steer, circuit, publish)
  • Adapter integrations (SAELens, TransformerLens, nnsight)
  • Performance wins (bf16 paths, torch.compile)
  • Tests, type hints, docs
Python · PyTorch · TRL

OpenInterpretability/mechreward

SAE features as dense RL rewards. Qwen3.5-4B → 64% → 83% on GSM8K.

Scope
  • Port Stage Gate protocol to a new model
  • Submit a feature pack (helpful + harmful IDs)
  • Add a benchmark (SuperGPQA, MATH-500, BIG-Bench-Hard)
  • RepE / probing integration

Workflow

01

Open an issue

If the change is > 20 lines, align first. Drafts welcome.

02

Fork + branch

git checkout -b your-feature. Never push to main.

03

Make it work

Local tests pass. Notebooks run start-to-finish.

04

PR with evidence

Link to the issue. Paste numbers, screenshots, or logs.

05

Review + merge

A maintainer responds within 72h. We prefer kind + specific feedback.

Community

Code of Conduct

We follow Contributor Covenant 2.1. Summary: be kind, be honest, assume good faith. Interpretability is a young field — many contributors are first-timers. Kindness is a feature, not a constraint. Report violations to hi@openinterp.org.