Runs & assertions
A run executes one suite against a real browser. This page covers the execution model: how cases are sequenced, how state flows between them, and how each assertion gets a verdict.
What a run is
When you start a run — from the dashboard, POST /api/v1/runs, the CLI, or the MCP server — the engine opens one Playwright browser for the entire suite so sequenced cases share the session (and any capture:d values). Via the API the runner service spins it up in a background worker pool — runs are persisted to a durable queue and executed in parallel up to TRIPWIRE_MAX_CONCURRENT_RUNS (default 2); submit more and the rest wait their turn, and the queue survives a restart. A run:
- Streams live progress as each case and step executes (
case_start,action,shot,case_end,done). - Persists its events and final report to the store.
- Is observable via
GET /api/v1/runs/{id}.
Sequencing & shared state
Cases run strictly in sequence, top to bottom, and share state. This is what lets a suite read like a real user journey.
fixtures:
email: { gen: unique_email } # produced ONCE per run
password: { gen: password }
cases:
- id: signup # signs up ${email}
...
- id: login # logs in as the SAME ${email}
depends_on: signup
...- Fixtures are generated once per run and shared by every case via
${name}. - Captures let a case bind a value it read from the page for later cases to use.
depends_onskips a case when its prerequisite didn't pass, so dependent failures don't add noise.
See Writing tests for the full syntax.
The per-step loop
Each step runs a small, bounded agent loop against the live page:
- Read the DOM. Claude is given the page's title, URL, visible text, and a numbered list of interactive elements.
- Act by reference. It calls tools — click an element by
#ref, set a field, navigate, press a key, create or upload a file — until the step is done. - Stop. When the step is complete the model calls
done.
Because the agent re-reads the live page each step, harmless UI changes don't break it — that's self-healing. A step that genuinely can't be carried out (a selector that's truly gone, a failed navigation) marks the case broken.
Assertions: two paths to a verdict
After a case's steps complete, each expect entry produces a forced pass / fail verdict. There are two ways it gets one:
Deterministic (a check)
When the assertion carries a check, it's verified directly against the page state — no model call, no cost, no ambiguity:
expect:
- { id: a1, assert: "the URL path is /account", check: { kind: url, path_is: "/account" } }The four kinds are url, visible_text, absent_text, and element_visible. See the Checks reference.
Model-adjudicated
With no check, the model adjudicates the assertion against the page's visible text and a screenshot, and is forced to return a clear pass / fail with a reason and what it observed. A missing element or a wrong value is a fail.
expect:
- { id: a1, assert: "a friendly welcome message greets the user by name" }Prefer a check where you can — it's faster, cheaper, and unambiguous.
Case statuses
| Status | Meaning |
|---|---|
passed | Every assertion passed. |
failed | At least one assertion failed (or was inconclusive). |
broken | A step couldn't be carried out — the run couldn't even reach the assertions. |
skipped | A depends_on prerequisite didn't pass. |
These map to the CLI/Action exit codes: 0 all passed · 1 a failure · 2 a broken case · 3 a spec/config error. See CI.
The report
When the run finishes, the report records — per case and per assertion — what worked, what failed, and why, and is written as report.json, junit.xml, and report.html.
- Passed assertions, with their verdict and what was observed.
- Failed assertions, with the reason and — where available — a backend root cause (the failing request, its
trace_id, the server error, the suspected cause, and a suggested fix). - Skipped cases, with the dependency that didn't pass.
A verified failure flows straight into a root-caused, deduped issue filed to your configured destinations.
Continue to Self-heal & root-cause.