TraceFlux

CONTROL PLANE • REPLAY EXECUTIONS

Re-run incidents with pinned inputs — and prove parity before production.

Replay Executions let teams reproduce incident conditions on a captured telemetry window using a specific policy version. You get a parity report, diffs, and an auditable run record — so automation is validated before it impacts production.

PINNED WINDOW
Reproducible inputs
Telemetry window + context
PARITY REPORTS
Verifiable outputs
Diff + confidence + scope
AUDITABLE RUNS
Governed execution
Approvals + immutable log
REPLAY EXECUTION — RUN VIEW
Policy v2.14.3
Execution Pipeline
Running
Captured window
2026-02-26 10:12 → 10:16 UTC
Signals
Flow • BGP • DNS • Metrics
Mode
Dry-run (no production actions)
Scope
edge-gw / us-east-1 / POP-12
STAGES
Ingest & normalize
Completed
Deterministic correlation
Completed
Incident aggregation
Completed
Parity report generation
In progress
Parity & Diff Summary
auto-refresh
Incidents formed
1
Alerts collapsed
146 → 1
Confidence
High
Action gates
Required
DIFF HIGHLIGHTS
Suggested
Fingerprint tightened
Reduced duplicate fan-out by aligning BGP flap + DNS NXDOMAIN into one episode.
Suppression rule candidate
Drop low-confidence jitter spikes below baseline threshold for POP-12.
Action gate required
Any remediation must pass approval + blast-radius bounds.

Replay executions create an auditable record of what was evaluated, what changed, and what would happen before production actions run.

WHAT IS A REPLAY EXECUTION?

A tracked run that reproduces incident formation.

A replay execution re-processes a captured telemetry window through TraceFlux using a pinned correlation policy version. It produces artifacts (parity report, diffs, evidence bundle) and records governance (who triggered it, approvals, outcomes).

Pinned window

Replay the same 2–10 minute incident window with consistent inputs.

Policy versioning

Compare results across policy changes without guessing.

Governed outcomes

Approval gates + audit logs for every execution and artifact.

EXECUTION LIFECYCLE

A repeatable pipeline from capture → parity → promotion.

Run replays to validate incident formation, tune policies, and prove automation safety before enforcing changes.

Capture window

Select an incident window and scope (service, region, POP).

Plan

Choose mode: Dry-run vs Enforced. Pin policy version and controls.

Execute

Re-process telemetry through correlation and incident aggregation.

Review & promote

Inspect parity report + diffs. Approve policy changes or actions.

RECENT EXECUTIONS

Run history that reads like a real control plane.

Enterprise operators expect an auditable record of job runs and outcomes — replay executions are first-class events in TraceFlux.

See Incident Engine
Run ID
Incident
Mode
Status
Duration
Triggered By
Policy
rx-7f3c1
edge-gw POP-12 routing instability
Dry-run
Succeeded
18s
NOC-Reviewer
v2.14.3
rx-7f3bd
dns surge + latency spike
Dry-run
Running
SRE-Oncall
v2.14.3
rx-7f39a
bgp flap episode collapse
Enforced
Succeeded
26s
NetEng-Lead
v2.14.2
rx-7f318
packet-loss threshold tuning
Dry-run
Failed
9s
PlatformOps
v2.14.1
Dry-run as default

Validate changes without triggering production actions. Treat enforcement as a promoted state.

Blast-radius bounds

Scope executions to a region/POP/service and require approvals before widening impact.

Evidence-first reviews

Parity reports and diffs attach directly to the incident narrative and audit record.

ARTIFACTS

Outputs you can share, audit, and defend.

Replay executions are only valuable if they produce durable, reviewable artifacts — not just “it worked on my machine.”

Parity report

What changed vs baseline and why. Confidence + scope included.

Diff summary

Fingerprint, suppression, and correlation diffs across policy versions.

Evidence bundle

Signals + timeline + metadata packaged for review and tickets.

Decision log

Who triggered, who approved, what gates passed, and what ran.

NEXT STEP

See replay executions on your telemetry.

We’ll walk through capture → correlation → incident formation → replay validation and show how parity reports and governance gates prevent risky automation.

Dry-run by default, enforce by promotion
Approval gates + blast-radius bounds
Auditable run history + artifacts