Play, get feedback, save local progress, and optionally submit a leaderboard score.
Concept explanation
Incident response is a race against uncertainty. This game gives you a pager, a symptom, and a handful of possible signals so you can practice finding the truth without drowning in noise.
Your local progress
0 XP0 games played0 completed
Progress, review history, and best scores are stored in this browser with localStorage.
Use the controls below. Feedback appears immediately, and final scores are stored locally.
Leaderboard
Top 10 submitted scores. No account required.
Loading leaderboard...
Finish the game to load your latest local score.
Learning objectives
Choose high-signal telemetry for common backend incidents.
Use metrics, logs, traces, deploy markers, and request IDs together.
Distinguish actionable alerts from noisy operational trivia.
How to play
Read the production symptom and available context.
Choose the telemetry move that would reduce uncertainty fastest.
Use explanations to build an incident response mental model.
Scoring
High-signal triage choices add points and streak bonuses.
Low-signal detours explain why they waste time or hide user impact.
Completion saves local progress and best triage score.
Backend concept notes
Observability is the ability to ask useful questions about a running system. During incidents, the best signals connect user symptoms, recent changes, failing dependencies, and concrete request paths.
Metrics show shape and impact, traces show where time went, logs provide event detail, request IDs connect user reports, and alerts should be tied to actionable user-facing risk.
Common mistakes
Relying only on average latency while p95 or p99 users suffer.
Diving into random logs before scoping by service, route, deploy, or request id.
Alerting on noisy resource blips instead of sustained symptoms or SLO burn.
Watching queue depth without message age, retry rate, or worker errors.
Related Backend Study Lab articles
Use the main site for deeper reading after playing.