CONTROL PLANE • REPLAY EXECUTIONS

Re-run incidents with pinned inputs — and prove parity before production.

Replay Executions let teams reproduce incident conditions on a captured telemetry window using a specific policy version. You get a parity report, diffs, and an auditable run record — so automation is validated before it impacts production.

Request Demo View Docs

PINNED WINDOW

Reproducible inputs

Telemetry window + context

PARITY REPORTS

Verifiable outputs

Diff + confidence + scope

AUDITABLE RUNS

Governed execution

Approvals + immutable log

REPLAY EXECUTION — RUN VIEW

Policy v2.14.3

Execution Pipeline

Running

Captured window

2026-02-26 10:12 → 10:16 UTC

Signals

Flow • BGP • DNS • Metrics

Mode

Dry-run (no production actions)

Scope

edge-gw / us-east-1 / POP-12

STAGES

✓

Ingest & normalize

Completed

✓

Deterministic correlation

Completed

✓

Incident aggregation

Completed

•

Parity report generation

In progress

Parity & Diff Summary

auto-refresh

Incidents formed

Alerts collapsed

146 → 1

Confidence

High

Action gates

Required

DIFF HIGHLIGHTS

Suggested

Fingerprint tightened

Reduced duplicate fan-out by aligning BGP flap + DNS NXDOMAIN into one episode.

Suppression rule candidate

Drop low-confidence jitter spikes below baseline threshold for POP-12.

Action gate required

Any remediation must pass approval + blast-radius bounds.

Replay executions create an auditable record of what was evaluated, what changed, and what would happen before production actions run.

WHAT IS A REPLAY EXECUTION?

A tracked run that reproduces incident formation.

A replay execution re-processes a captured telemetry window through TraceFlux using a pinned correlation policy version. It produces artifacts (parity report, diffs, evidence bundle) and records governance (who triggered it, approvals, outcomes).

Pinned window

Replay the same 2–10 minute incident window with consistent inputs.

Policy versioning

Compare results across policy changes without guessing.

Governed outcomes

Approval gates + audit logs for every execution and artifact.

EXECUTION LIFECYCLE

A repeatable pipeline from capture → parity → promotion.

Run replays to validate incident formation, tune policies, and prove automation safety before enforcing changes.

Capture window

Select an incident window and scope (service, region, POP).

Plan

Choose mode: Dry-run vs Enforced. Pin policy version and controls.

Execute

Re-process telemetry through correlation and incident aggregation.

Review & promote

Inspect parity report + diffs. Approve policy changes or actions.

RECENT EXECUTIONS

Run history that reads like a real control plane.

Enterprise operators expect an auditable record of job runs and outcomes — replay executions are first-class events in TraceFlux.

See Incident Engine

Run ID

Incident

Mode

Status

Duration

Triggered By

Policy

rx-7f3c1

edge-gw POP-12 routing instability

Dry-run

Succeeded

18s

NOC-Reviewer

v2.14.3

rx-7f3bd

dns surge + latency spike

Dry-run

Running

—

SRE-Oncall

v2.14.3

rx-7f39a

bgp flap episode collapse

Enforced

Succeeded

26s

NetEng-Lead

v2.14.2

rx-7f318

packet-loss threshold tuning

Dry-run

Failed

PlatformOps

v2.14.1

Dry-run as default

Validate changes without triggering production actions. Treat enforcement as a promoted state.

Blast-radius bounds

Scope executions to a region/POP/service and require approvals before widening impact.

Evidence-first reviews

Parity reports and diffs attach directly to the incident narrative and audit record.

ARTIFACTS

Outputs you can share, audit, and defend.

Replay executions are only valuable if they produce durable, reviewable artifacts — not just “it worked on my machine.”

Parity report

What changed vs baseline and why. Confidence + scope included.

Diff summary

Fingerprint, suppression, and correlation diffs across policy versions.

Evidence bundle

Signals + timeline + metadata packaged for review and tickets.

Decision log

Who triggered, who approved, what gates passed, and what ran.

NEXT STEP

See replay executions on your telemetry.

We’ll walk through capture → correlation → incident formation → replay validation and show how parity reports and governance gates prevent risky automation.

Request Demo View Docs

Dry-run by default, enforce by promotion

Approval gates + blast-radius bounds

Auditable run history + artifacts