← Back to skills
Debugging & Incident TriageMedium ReliabilityHigh-Risk Workflow

Debug Incident Triage

Structures incident debugging by grouping recurring log signatures and producing prioritized next-step diagnostics for on-call teams.

Version

0.1.0

Last Updated

Apr 22, 2026

Verification Type

reconciliation checks, manual review required

Downloads

0

Required inputs

  • incident_logs (text)

    Error and warning events collected during incident window.

  • service_context (markdown)

    Deployment topology and dependency context.

Expected outputs

  • triage_report (markdown)

    Ranked hypotheses with evidence and confidence.

  • next_checks (markdown)

    Ordered diagnostic checks with command-level specificity.

Included checks and assets

  • scripts/cluster_log_signatures.py (script)

    Groups recurring error signatures by normalized log message.

  • references/triage-template.md (reference)

    Template for consistent incident triage reports.

  • references/hypothesis-ranking.md (reference)

    Confidence ranking guidance for triage hypotheses.

  • references/containment-playbook.md (reference)

    Safe immediate containment action patterns.

Failure modes

  • Insufficient telemetry leads to incorrect hypotheses.
  • Clustered messages may merge distinct root causes.
  • Human triage quality varies under incident pressure.

Ideal use cases

  • On-call incident triage
  • CI failure clustering
  • Runtime outage diagnostics

Example runs

Checkout API timeout incident

Validated sample run

Clusters timeout and DB connection errors into prioritized hypotheses.

Input preview

incident logs + recent deploy list

Output preview

Top hypothesis: DB pool exhaustion after rollout

Changelog summary

  • 0.1.0 · Apr 22, 2026

    Initial release for incident triage workflow.

Links

Inspect the source, read authored documentation, or download the published skill bundle.