Runs & assertions

A run executes one suite against a real browser. This page covers the execution model: how cases are sequenced, how state flows between them, and how each assertion gets a verdict.

What a run is

When you start a run — from the dashboard, POST /api/v1/runs, the CLI, or the MCP server — the engine opens one Playwright browser for the entire suite so sequenced cases share the session (and any capture:d values). Via the API the runner service spins it up in a background worker pool — runs are persisted to a durable queue and executed in parallel up to TRIPWIRE_MAX_CONCURRENT_RUNS (default 2); submit more and the rest wait their turn, and the queue survives a restart. A run:

Streams live progress as each case and step executes (case_start, action, shot, case_end, done).
Persists its events and final report to the store.
Is observable via GET /api/v1/runs/{id}.

Sequencing & shared state

Cases run strictly in sequence, top to bottom, and share state. This is what lets a suite read like a real user journey.

yaml

fixtures:
  email:    { gen: unique_email }     # produced ONCE per run
  password: { gen: password }

cases:
  - id: signup        # signs up ${email}
    ...
  - id: login         # logs in as the SAME ${email}
    depends_on: signup
    ...

Fixtures are generated once per run and shared by every case via ${name}.
Captures let a case bind a value it read from the page for later cases to use.
depends_on skips a case when its prerequisite didn't pass, so dependent failures don't add noise.

See Writing tests for the full syntax.

The per-step loop

Each step runs a small, bounded agent loop against the live page:

Read the DOM. Claude is given the page's title, URL, visible text, and a numbered list of interactive elements.
Act by reference. It calls tools — click an element by #ref, set a field, navigate, press a key, create or upload a file — until the step is done.
Stop. When the step is complete the model calls done.

Because the agent re-reads the live page each step, harmless UI changes don't break it — that's self-healing. A step that genuinely can't be carried out (a selector that's truly gone, a failed navigation) marks the case broken.

Assertions: two paths to a verdict

After a case's steps complete, each expect entry produces a forced pass / fail verdict. There are two ways it gets one:

Deterministic (a `check`)

When the assertion carries a check, it's verified directly against the page state — no model call, no cost, no ambiguity:

yaml

expect:
  - { id: a1, assert: "the URL path is /account", check: { kind: url, path_is: "/account" } }

The four kinds are url, visible_text, absent_text, and element_visible. See the Checks reference.

Model-adjudicated

With no check, the model adjudicates the assertion against the page's visible text and a screenshot, and is forced to return a clear pass / fail with a reason and what it observed. A missing element or a wrong value is a fail.

yaml

expect:
  - { id: a1, assert: "a friendly welcome message greets the user by name" }

Prefer a check where you can — it's faster, cheaper, and unambiguous.

Case statuses

Status	Meaning
`passed`	Every assertion passed.
`failed`	At least one assertion failed (or was inconclusive).
`broken`	A step couldn't be carried out — the run couldn't even reach the assertions.
`skipped`	A `depends_on` prerequisite didn't pass.

These map to the CLI/Action exit codes: 0 all passed · 1 a failure · 2 a broken case · 3 a spec/config error. See CI.

The report

When the run finishes, the report records — per case and per assertion — what worked, what failed, and why, and is written as report.json, junit.xml, and report.html.

Passed assertions, with their verdict and what was observed.
Failed assertions, with the reason and — where available — a backend root cause (the failing request, its trace_id, the server error, the suspected cause, and a suggested fix).
Skipped cases, with the dependency that didn't pass.

A verified failure flows straight into a root-caused, deduped issue filed to your configured destinations.

Continue to Self-heal & root-cause.

Runs & assertions ​

What a run is ​

Sequencing & shared state ​

The per-step loop ​

Assertions: two paths to a verdict ​

Deterministic (a check) ​

Model-adjudicated ​

Case statuses ​

The report ​

Runs & assertions

What a run is

Sequencing & shared state

The per-step loop

Assertions: two paths to a verdict

Deterministic (a `check`)

Model-adjudicated

Case statuses

The report