DORA Framework
1) Definition
The DORA Framework (DevOps Research & Assessment) is a research-backed model that links engineering practices to business outcomes using four delivery metrics — Lead Time, Deployment Frequency, Change Failure Rate, and MTTR — plus the enabling capabilities that improve them.
2) Why it matters
DORA gives CTOs a small, objective scoreboard for delivery health. It replaces opinions with signals you can measure weekly, compare across services, and tie to product results (activation, retention, revenue). Teams with strong DORA performance consistently ship faster, safer, and more predictably.
3) Core components
- Four Flow Metrics: Lead Time, Deployment Frequency, Change Failure Rate (CFR), Mean Time to Restore (MTTR).
- Capabilities: Trunk-based dev, CI/CD automation, test strategy (unit/contract first), observability, progressive delivery (flags, canaries), incident response, and a learning culture (post-incident reviews).
- Service orientation: Measure per service/application; roll up later.
- Continuous improvement: Use metrics to choose small, high-leverage experiments each sprint.
4) How to apply (step by step)
- Define terms: Publish what counts as a deploy, a failure, and “restored.” Keep it to one page.
- Instrument events: Emit timestamps for PR merge, build, deploy start/finish, flag toggles, incident start/resolve.
- Create dashboards: Per service, show Lead Time (p50/p95), Deploys/day, CFR (rolling), MTTR, plus annotations (deploys, ring steps).
- Set guardrails: Example targets—Lead Time p50 ≤ 24h, Deploys daily, CFR < 15–20%, MTTR < 1–2h (context-dependent).
- Review weekly: Owners explain shifts; pick one experiment per metric (e.g., PR size cap, rollback drill).
- Link to outcomes: Correlate DORA changes with KRs. If flow improves but KRs don’t, fix what you’re building.
- Standardize wins: Template successful practices into your paved road (repo scaffolds, CI/CD blueprints, runbooks).
5) Examples & analogies
- Example (Payments API): Lead Time 3d → 0.8d after trunk-based dev, test sharding, and “deploy on merge.” CFR drops from 22% → 9% with flags + contract tests. MTTR falls to 35m after a 5-minute rollback drill.
- Example (Mobile app): Release trains → weekly ringed releases + server-controlled flags; Deploys/week triple while CFR stays flat via crash-rate SLOs.
- Analogy (restaurant kitchen): Short Lead Time = fast prep, Deploy Frequency = steady plates leaving, CFR = dishes sent back, MTTR = time to fix a bad order. Tight stations and checklists keep the flow.
- Analogy (F1 pit stop): Automation, roles, and rehearsals reduce both failures and restore time when something goes wrong.
6) Common mistakes to avoid
- Averages only: Means hide batchy work—track p50 and p95.
- Company-wide mush: Always measure per service, not a single global score.
- Counting non-prod/staging as deploys.
- Ignoring flag rollbacks in CFR (users felt pain—even if code didn’t roll back).
- Dashboards without actions: No weekly review → no improvement.
- Optimizing metrics, not outcomes: Keep product KRs in view.
7) Quick checklist (F.A.S.T.E.R.)
- Frame definitions (deploy/failure/restore, sources of truth).
- Acquire events (merge/build/deploy/flag/incident).
- Ship dashboards (per-service, with p50/p95 + annotations).
- Target guardrails (contextual, published).
- Experiment weekly (one change per metric).
- Review & roll-in (make wins the default paved road).
8) Actionable takeaways
- Deploy on merge with progressive delivery; make deploys boring.
- Cap PR size and parallelize CI to cut Lead Time quickly.
- Count flag-offs as failures; rehearse <5-minute rollbacks to shrink MTTR.
- Measure per service and review weekly with owners; pick one experiment per metric.
- Tie DORA to product KRs — flow is the means, outcomes are the end.
Keep it small, published, and relentlessly iterative. When DORA is healthy and connected to product impact, your strategy turns into shipped, reliable outcomes — on repeat.