Writing tests
Suites are plain-English YAML (*.tripwire.yaml). You describe what a user does; the LLM figures out how to do it against the live page. Cases run in sequence and share state, so a suite reads like a story: create a user, then log in as that user, then do something while logged in.
New here? Walk through authoring and running a suite end-to-end in Your first test. For every
checkkind in one place, see the Checks reference.
A complete suite
suite: "Signup & Login"
id: signup-login
base_url: http://localhost:8500
fixtures: # generated ONCE per run, shared across cases
email: { gen: unique_email }
password: { gen: password }
cases:
- id: signup
title: "A new user can sign up"
tags: [auth, smoke, p0]
steps:
- { id: s1, do: "Open the sign-up page" }
- { id: s2, do: "Enter ${email} and ${password}", secret: true }
- { id: s3, do: "Click 'Create account'" }
expect:
- { id: a1, assert: "a confirmation 'Account created' is shown",
check: { kind: visible_text, contains: "Account created" } }
capture:
- { name: account_id, from: "the new account id, if shown" }
- id: login
title: "That same user can log in"
tags: [auth, p0]
depends_on: signup # skipped if signup didn't pass
steps:
- { id: s1, do: "Open the login page" }
- { id: s2, do: "Enter ${email} and ${password}" } # the SAME user — no new account
- { id: s3, do: "Click 'Sign in'" }
expect:
- { id: a1, assert: "a welcome message is shown",
check: { kind: visible_text, contains: "Welcome back" } }Suite-level fields
| Field | Purpose |
|---|---|
suite | Human-readable name. |
id | Stable identifier (used in issue fingerprints and the API). |
base_url | Where the run starts; relative navigation (Open /checkout) resolves against it. |
fixtures | Generated values, produced once per run and shared by every case. |
env (optional) | Extra ${VAR} values; ${...} references in them resolve from the OS environment. |
cases | The ordered list of cases. |
Cases
Each case has an id, a title, optional tags, a list of steps, and an expect block (plus optional capture and depends_on). Cases run top to bottom in one shared browser session and carry state forward.
- id: pay-de
title: "A German customer can pay"
tags: [checkout, p0]
depends_on: login
steps: [ ... ]
expect: [ ... ]
capture: [ ... ]tags flow into the filed issue's labels and drive severity — a case tagged p0 or smoke is treated as higher severity when it fails.
Steps
A step is an intent in plain English:
steps:
- { id: s1, do: "Open the sign-up page" }
- { id: s2, do: "Enter ${email} and ${password}", secret: true }
- { id: s3, do: "Click 'Create account'" }do— what the user does, described naturally. No selectors.secret: true— marks a step that handles sensitive input so values aren't echoed.${name}— interpolates a fixture or a captured value (see below).
The model has a real toolbox for each step: read the DOM, navigate, click an element, set a form field or select option, press a key, and — importantly for upload flows — create a file with given content and upload it. So a step like "Create a small CSV and upload it as your import file" works without you producing a fixture file. See Architecture.
Assertions (expect)
Each expect entry is an assertion. It always gets a forced pass / fail verdict:
expect:
- { id: a1, assert: "a confirmation 'Account created' is shown",
check: { kind: visible_text, contains: "Account created" } }
- { id: a2, assert: "no error banner is shown",
check: { kind: absent_text, any_of: ["error", "failed"] } }assert— the condition in plain English.check(optional) — a deterministic check (see the Checks reference). When present, the assertion is verified directly against the page with no model cost; without it, the model adjudicates against the page text and a screenshot.
Core concepts
Fixtures — a unique user, generated once
Generators produce a value once per run and share it across every case via ${name}:
fixtures:
email: { gen: unique_email }
password: { gen: password }Available generators:
| Generator | Produces |
|---|---|
unique_email | A run-unique email (ud+<tag>@tripwire.test). |
unique_username | A run-unique username (ud_user_<tag>). |
password | A valid password (Tripwire!<digits>). |
uuid | A 12-char hex id. |
timestamp | A run tag (epoch + random). |
Because they're generated once, login reuses exactly the ${email} / ${password} that signup registered with — the same user, shared across sequenced cases. A literal value ({ team: "Acme" }) is passed through unchanged.
Capture — share a produced value forward
capture reads a value a case produced and binds it so later cases can reuse it via ${name}:
capture:
- { name: account_id, from: "the new account id, if shown" } # model reads it
- { name: order_total, check: { kind: visible_text } } # deterministic readfrom: lets the model extract the value from the page; a check: reads it deterministically. Captured values join the shared state for every subsequent case.
depends_on — skip when a prerequisite failed
- id: login
depends_on: signup # skipped (not failed) if signup didn't passIf signup doesn't pass, login is skipped rather than failed — there's no user to log in as, so a failure there would be noise. depends_on references another case's id.
Checks — deterministic where it counts
A check makes an assertion deterministic — verified directly against the captured page state, with zero model cost and no ambiguity. The kinds are url, visible_text, absent_text, and element_visible. Prefer a check where you can; omit it to let the model adjudicate a subjective expectation. Full details, fields, and examples: Checks reference.
expect:
- { assert: "we land on the account page", check: { kind: url, path_is: "/account" } }
- { assert: "the order summary shows EUR", check: { kind: visible_text, contains: "EUR" } }
- { assert: "no USD is shown", check: { kind: absent_text, any_of: ["USD"] } }How it runs
A suite runs in a background worker that streams progress and persists the run. One browser is opened for the whole suite so sequenced cases share the session; fixtures and captures flow forward; each assertion gets a verdict; and the final report says what worked, failed, and why — with a backend root cause attached where a trace_id and a log backend are available. Read on:
- Checks reference — every check kind, field by field.
- Runs & assertions — the execution model in detail.
- Self-heal & root-cause — when the UI changes or something genuinely fails.
- Architecture — the engine that drives the browser.