Skip to content

Runs & assertions

A run executes one suite against a real browser. This page covers the execution model: how cases are sequenced, how state flows between them, and how each assertion gets a verdict.

What a run is

When you start a run — from the dashboard, POST /api/v1/runs, the CLI, or the MCP server — the engine opens one Playwright browser for the entire suite so sequenced cases share the session (and any capture:d values). Via the API the runner service spins it up in a background worker pool — runs are persisted to a durable queue and executed in parallel up to TRIPWIRE_MAX_CONCURRENT_RUNS (default 2); submit more and the rest wait their turn, and the queue survives a restart. A run:

  • Streams live progress as each case and step executes (case_start, action, shot, case_end, done).
  • Persists its events and final report to the store.
  • Is observable via GET /api/v1/runs/{id}.

Sequencing & shared state

Cases run strictly in sequence, top to bottom, and share state. This is what lets a suite read like a real user journey.

yaml
fixtures:
  email:    { gen: unique_email }     # produced ONCE per run
  password: { gen: password }

cases:
  - id: signup        # signs up ${email}
    ...
  - id: login         # logs in as the SAME ${email}
    depends_on: signup
    ...
  • Fixtures are generated once per run and shared by every case via ${name}.
  • Captures let a case bind a value it read from the page for later cases to use.
  • depends_on skips a case when its prerequisite didn't pass, so dependent failures don't add noise.

See Writing tests for the full syntax.

The per-step loop

Each step runs a small, bounded agent loop against the live page:

  1. Read the DOM. Claude is given the page's title, URL, visible text, and a numbered list of interactive elements.
  2. Act by reference. It calls tools — click an element by #ref, set a field, navigate, press a key, create or upload a file — until the step is done.
  3. Stop. When the step is complete the model calls done.

Because the agent re-reads the live page each step, harmless UI changes don't break it — that's self-healing. A step that genuinely can't be carried out (a selector that's truly gone, a failed navigation) marks the case broken.

Assertions: two paths to a verdict

After a case's steps complete, each expect entry produces a forced pass / fail verdict. There are two ways it gets one:

Deterministic (a check)

When the assertion carries a check, it's verified directly against the page state — no model call, no cost, no ambiguity:

yaml
expect:
  - { id: a1, assert: "the URL path is /account", check: { kind: url, path_is: "/account" } }

The four kinds are url, visible_text, absent_text, and element_visible. See the Checks reference.

Model-adjudicated

With no check, the model adjudicates the assertion against the page's visible text and a screenshot, and is forced to return a clear pass / fail with a reason and what it observed. A missing element or a wrong value is a fail.

yaml
expect:
  - { id: a1, assert: "a friendly welcome message greets the user by name" }

Prefer a check where you can — it's faster, cheaper, and unambiguous.

Case statuses

StatusMeaning
passedEvery assertion passed.
failedAt least one assertion failed (or was inconclusive).
brokenA step couldn't be carried out — the run couldn't even reach the assertions.
skippedA depends_on prerequisite didn't pass.

These map to the CLI/Action exit codes: 0 all passed · 1 a failure · 2 a broken case · 3 a spec/config error. See CI.

The report

When the run finishes, the report records — per case and per assertion — what worked, what failed, and why, and is written as report.json, junit.xml, and report.html.

  • Passed assertions, with their verdict and what was observed.
  • Failed assertions, with the reason and — where available — a backend root cause (the failing request, its trace_id, the server error, the suspected cause, and a suggested fix).
  • Skipped cases, with the dependency that didn't pass.

A verified failure flows straight into a root-caused, deduped issue filed to your configured destinations.

Continue to Self-heal & root-cause.

Tripwire — AI-native, self-healing E2E testing. Terms · Privacy · Legal Notice