Skip to content

Flakiness analytics

Tripwire computes flake and run analytics over your persisted run history — pass/fail trends, per-case flake rates, and the slowest cases — so you can find the unreliable tests and the slow ones instead of guessing. Maintenance is the single most-cited E2E pain; analytics is how you keep a suite honest as it grows.

Get the analytics

bash
curl http://127.0.0.1:8400/api/v1/analytics

It's computed on demand from the full run history — no separate job to schedule.

What it returns

json
{
  "total_runs": 42,
  "overall_pass_rate": 0.93,
  "flaky_cases": [ ... ],
  "cases": [ ... ],
  "trend": [ ... ],
  "slowest_cases": [ ... ]
}

Per-case history & flake rate

For each (suite_id, case_id), Tripwire tracks the outcome history (most recent first), the pass rate, and a flake rate:

json
{
  "suite_id": "checkout",
  "case_id": "pay-de",
  "title": "A German customer can pay",
  "runs": 20,
  "passed": 16,
  "failed": 4,
  "pass_rate": 0.8,
  "flaky": true,
  "flake_rate": 0.3,
  "history": ["passed", "failed", "passed", "passed", "failed", "..."],
  "avg_duration_ms": 12900
}

A case is flaky when its recent history (last 10 outcomes) is mixed — it both passed and didn't pass — or when a run explicitly marked it flaky. The flake_rate is the share of non-passing outcomes in that recent window. Cases are sorted flakiest-first, so the tests eroding your trust float to the top.

Pass-rate trend

trend is one point per run (oldest → newest): the run id, timestamp, suite, case count, and pass rate — ready to plot as a line so you can see whether the suite is getting healthier or sicker over time.

Slowest cases

slowest_cases ranks cases by average duration (top 10). Use it to find the steps worth tightening or splitting.

In the dashboard

The dashboard surfaces these as the analytics view: overall pass rate, a trend chart, a flaky-cases list, and the slowest cases — the at-a-glance health of your suite.

How to act on it

  • Quarantine the flakiest cases at the top of the list; fix the underlying nondeterminism (timing, shared state, a genuinely flaky backend) rather than muting the symptom.
  • Watch the trend after a refactor — a dipping pass rate is an early regression signal.
  • Trim the slowest cases to keep CI fast.

Related: Runs & assertions · CI

Tripwire — AI-native, self-healing E2E testing. Terms · Privacy · Legal Notice