Architecture

Tripwire is two standalone apps over a shared engine, plus an MCP server and a CLI that reuse the same engine: a FastAPI backend (the JSON API + the execution engine) and a Vite + React dashboard.

Cursor / Claude Code ─► MCP server ─┐
CI / terminal ───────► CLI ─────────┤   (both import the engine directly)
                                     │
Browser ──HTTP──► Dashboard (Vite/React SPA)
                      │  fetch /api/v1
                      ▼
                 Backend (FastAPI)
   api/v1/routes ─► services ─► store (file repos)
        │               │
        │               └─ engine ── Claude (brain) ─► Playwright (hands) ─► REAL browser
        ▼                                │
   suites · runs · issues ·              ▼  network capture → trace_id
   settings · generate ·        failure → server-log correlation → Claude → root cause
   analytics · export · health

Claude is the brain, Playwright is the hands

The execution core (engine/pw_agent.py + engine/pw.py) runs plain-English steps in a real browser on any OS — Windows, macOS, or Linux — headless or headed. No AppleScript, no Quartz, no OS permission prompts:

Claude is the brain. For each step it reads the live DOM and turns the plain-English intent into concrete browser actions.
Playwright is the universal hands. It launches and drives a headless Chromium that executes those actions, and captures the network so a UI failure still yields a trace_id.

There is also a legacy macOS desktop path (structure.py + mac_control.py) that drives the local browser via native Quartz events. It's optional and selected with TRIPWIRE_DRIVER=desktop. The default driver is playwright — cross-platform — and is what the CLI, MCP server, and GitHub Action all use.

The step toolbox

On the Playwright path, each step is driven by a small, explicit toolset the model calls:

Tool	What it does
`read_dom`	Read the page: title, URL, visible text, and numbered interactive elements (`#ref`).
`goto`	Navigate to a path or URL.
`click_element`	Click an element by its `#ref`.
`set_field`	Set an input / textarea / select by `#ref` (option text for selects); optionally submit.
`press_key`	Press a key or chord (`Enter`, `Control+a`).
`create_file`	Create a file with given content — to produce a file to upload.
`upload_file`	Upload a file at a path into a file input by `#ref`.
`done`	The step is complete.

Because the model both creates and uploads files, an import/upload flow needs no pre-made fixture file: "Create a small CSV and upload it as your import" just works.

Backend (`backend/app`)

Layered so each concern is swappable and testable in isolation.

Layer	Module	Responsibility
core	`core/`	`config.py` (pydantic-settings + data paths), `errors.py`, `logging.py`.
schemas	`schemas/models.py`	Pydantic request models.
store	`store/db.py`, `store/models.py`, `store/repos.py`	SQLAlchemy-backed repositories — suites, runs, issues, plans, settings, users/API tokens — behind a clean interface. SQLite by default, Postgres in production.
services	`services/`	`runner` (a durable, parallel worker pool that claims queued runs and executes them, streaming progress + persisting), `auth` (users, sessions, API tokens), and `issues` (the native File-to-Tripwire tracker + dedup, and dispatch to GitHub / GitLab / Jira).
api/v1	`api/v1/routes/`	The HTTP surface — see below.
engine	`engine/`	The execution core.

The engine modules

Module	Role
`pw.py`	The Playwright `Driver` — launch, navigate, page state, network capture, root cause.
`pw_agent.py`	The cross-platform runner: per-step LLM loop + suite executor (the default path).
`e2e.py`	Suite loading, `${var}` substitution, fixtures, deterministic checks, model adjudication.
`serverlogs.py`	Server-log correlation + LLM synthesis: `trace_id` → backend error + cause + fix.
`generate.py`	URL → suite generation (crawl, read structure, propose a suite).
`export_playwright.py`	Compile a suite to a portable Playwright `.spec.ts`.
`record.py`	Record-from-clicks: a recorded action trace → a plain-English suite.
`analytics.py`	Flake / run analytics over persisted runs.
`reports.py`	`report.json` · `junit.xml` · `report.html` + provider-agnostic issues (fingerprinted).
`trackers.py`	Live issue posting to GitHub / GitLab / Jira, deduped by fingerprint.
`structure.py`, `mac_control.py`, `agent.py`, `cdp.py`, `logs.py`	The legacy macOS desktop path + shared DOM-read JS and network/log capture.

The HTTP surface (`/api/v1`)

health · suites (CRUD + {name}/export) · runs (queue, inspect, queue state) · issues (the native tracker: list, transition, comment) · settings (secret-masked) · generate (URL → suite) · analytics (flake/run trends). Full details in the API reference.

Storage

Structured data lives in a SQL database via SQLAlchemy — SQLite by default (a single file under data/), Postgres in production (TRIPWIRE_DATABASE_URL). The tables: suites, runs, issues, plans, settings (secret values encrypted at rest), and users / api_tokens. Tables are created on startup; on first boot any legacy file-based data is imported once.

Run artifacts stay on the filesystem under data/ (served at /api/v1/artifacts):

data/
  tripwire.db    the default SQLite database (suites, runs, issues, plans, settings, users)
  artifacts/     report.json / junit.xml / report.html / per-step screenshots / issues
  baselines/     visual-diff baselines (screenshot_region checks)

The repository interface in store/repos.py keeps the same method signatures whether the backing store is SQLite or Postgres, so the services and API above it are unchanged.

Frontend (`frontend/src`)

A Vite + React + Tailwind SPA with TanStack Query as the data layer. It talks only to /api/v1, so the dashboard and any other client (scripts, CI, the MCP server) share exactly the same contract.

How a run flows

The dashboard (or an API client) POSTs to /runs with a suite (or several). Each run is persisted as a queued row — a durable queue that survives a restart.
The runner service runs a worker pool (TRIPWIRE_MAX_CONCURRENT_RUNS, default 2) that claims queued rows transactionally and executes them in parallel, streaming progress. A crash mid-run is reconciled on the next startup.
The engine opens one Playwright browser for the whole suite; Claude drives each step; assertions are adjudicated (deterministically where a check exists).
On a failure, the network buffer yields a trace_id; if a log backend is configured, serverlogs.py correlates it and Claude synthesizes the backend root cause.
reports.py writes the artifacts and builds a deduped issue; services/issues or trackers.py files it. Results persist to store, and the dashboard renders live.

Continue to Runs & assertions.

Architecture ​

Claude is the brain, Playwright is the hands ​

The step toolbox ​

Backend (backend/app) ​

The engine modules ​

The HTTP surface (/api/v1) ​

Storage ​

Frontend (frontend/src) ​

How a run flows ​