0. Engineering Roles

Engineering Roles

Purpose: Define the core roles across Engineering (and critical adjacent partners) with crisp missions, decision rights, operating loops, KPIs, and “definition of done.” Ends with a RACI over the most common responsibilities.

Default assumptions: single paved road; ringed deploys; SLOs before GA; one owner per service; WIP ≤ 2 bets/squad.


1) CTO — Chief Technology Officer

North Star: Turn strategy into shipped, reliable outcomes.

Owns (A): Org topology; quality bars policy (flags/telemetry/rollback/SLOs); architecture principles; reliability posture; talent system; board/executive tech narrative.

Weekly loops: Outcome review; reliability huddle; platform council; hiring loop; 1:1s with leads.

KPIs: Lead time p50 < 24h · CFR < 15% · MTTR < 1h · SLO attainment ≥ 99% · Platform adoption ≥ 80%.

Definition of Done: 100% services owned with SLOs & runbooks; policy gates enforced in CI; two months of green DORA+SLO; cadence running (weekly/monthly/quarterly).


2) HoE — Head of Engineering / VP Eng

North Star: Predictable, high‑quality delivery.

Owns (A): Delivery ops (WIP caps, change calendar), gate enforcement, service catalog hygiene, incident program, hiring/onboarding execution, dependency management.

Weekly loops: WIP & deps sweep · team outcome reviews · reliability huddle · platform check · hiring/people.

KPIs: Lead time < 24h · deploy ≥ daily · CFR < 15% · MTTR < 1h · wait time ≤ 2d · adoption ≥ 80%.

Definition of Done: Ops scoreboard live; gates enforced in CI; pages ≤2/team/wk; adoption trending to 80%.


3) HoP — Head of Platform

North Star: A paved road that makes the right way the easy way.

Owns (A): Templates/scaffolds; CI/CD; preview envs; observability defaults; platform SLOs; deprecations; supply chain (SBOM/scanning); internal customer success (SLA/docs/office hours).

Weekly loops: SLO dashboard review · backlog triage · office hours · migrations check.

KPIs: Build p50 < 10m (p95 < 20m) · Preview p95 < 5m · Uptime ≥ 99.9% · Flake < 2% · Adoption ≥ 80% · Dev NPS ≥ +40.

Definition of Done: Charter+SLOs published; SLOs green; ≥60% adoption and rising; exception memos tracked; deprecation wave completed with guides.


4) Stream TL — Tech Lead (value‑stream squad)

North Star: Ship outcomes quickly and safely for one stream.

Owns (A): End‑to‑end of stream services (SLOs, runbooks, on‑call); testing strategy; ringed deploys; within‑boundary architecture; security/privacy for changes.

Weekly loops: Outcome review (with PM) · WIP≤2 & dependency pass · reliability huddle · post‑deploy verification · tech‑debt/perf slice.

KPIs: Lead time < 24h · deploy ≥ daily · CFR < 15% · MTTR < 1h · SLOs green · wait time ≤ 2d.

Definition of Done: One owner per service; contract tests for external interfaces; rollback < 5m; preview p95 < 5m; pager quiet (≤2/wk).


5) PM — Product Manager (stream‑aligned)

North Star: Move the NSM and KRs with the fewest, safest changes.

Owns (A): Problem briefs & success metrics; within‑stream portfolio (WIP ≤ 2); experiment plans (MDE/power/guardrails); readiness gates; release notes & enablement; outcome reviews.

Weekly loops: Outcome review · backlog/WIP check · customer time · evidence & decision notes · GTM sync.

KPIs: KR delta/week on target · activation/adoption thresholds · idea→decision ≤ 2w · CFR ≤ +2pp vs baseline post‑release.

Definition of Done: Problem briefs, experiment plans, decision notes, KR tree, outcome dashboard; one Pilot→Beta→GA completed with guardrails intact.


6) SRE Lead / SRE

North Star: Keep systems available, fast, and safe—predictably.

Owns (A): SLOs/SLIs/alerts; error‑budget policy; incident response (on‑call, RCAs, drills); observability; resilience controls; DR/backups.

Weekly loops: SLO & alert review · incident huddle · change‑safety sync · on‑call health.

KPIs: SLO periods green ≥ 99% · MTTR < 1h · pages ≤ 2/team/wk · alert precision > 90% · backup restore 100%.

Definition of Done: SLOs live for tier‑1/2; burn alerts wired; policies gating rings; DR drill evidence; RCAs closed with actions merged.


7) Data Lead / Analytics Engineer

North Star: Trusted, self‑serve data and decision‑quality experiments.

Owns (A): Metric definitions & semantic layer; event taxonomy & data contracts; data reliability (freshness SLAs, tests, lineage); self‑serve BI & governance; experiment standards.

Weekly loops: Freshness & tests review · schema/contract triage · experiment clinic · office hours.

KPIs: Metric spec coverage 100% · freshness ≥ 99% · tests ≥ 95% pass · data MTTR < 1h · semantic‑layer usage ≥ 80%.

Definition of Done: NSM/KR specs live; SLAs green; certified dashboards; experiment policy enforced (MDE/power/guardrails).


8) Staff / Principal Engineer (senior IC)

North Star: Raise the technical bar and unlock multiple teams.

Owns (A): System design in a domain; one‑way door choices (ADR/RFC); reference implementations & golden paths; performance budgets; deprecations & migrations; design review program.

Weekly loops: Design reviews · pairing/implementation · reliability & perf triage · mentoring · short docs/ADRs.

KPIs: Wait time ≤ 2d · CFR < 15% · MTTR < 1h · hot‑path p95 on budget · golden‑path adoption ≥ 80%.

Definition of Done: Two cross‑team wins shipped; contracts in CI; perf budgets gated; deprecation plan executed; review SLA ≤ 48h.


9) Software Engineers — Stream & Platform

North Star: Ship small changes that move KRs, safely.

Core responsibilities (R):

Weekly habits: 5–10 PRs merged; post‑deploy smoke; fix one paper‑cut; pair once; update one runbook/checklist.

Quality bars: No secrets in code; rollback < 5m; preview p95 < 5m; zero new flaky tests; changes observable by default.


RACI — Core Responsibilities by Role

Roles: CTO · HoE · HoP · TL · PM · SRE · Data · Staff/Princ · Eng

Responsibility CTO HoE HoP TL PM SRE Data Staff/Princ Eng
1) Org topology & team design A R I I I I I C I
2) Quality bars policy (flags/telemetry/rollback/SLOs) C A R R C R I C R
3) Service ownership (catalog, on‑call, SLO dashboards) I A I R I C I C R
4) Platform (templates, CI/CD, previews, obs) I C A C I C I C R
5) SLOs & incident management I C C R I A I C R
6) Stream portfolio & WIP ≤2 C C I R A I I I R
7) Release gates & ringed deploys I C C A C R I C R
8) Error‑budget governance C C I R I A I C R
9) Metrics & experimentation standards C I I R R I A C R
10) Security & privacy posture A R R R I R C C R
11) Change calendar & freeze rules I A C R C C I I R
12) Deprecations & migrations (platform/libs) I C A R I C I R R
13) Contracts & interfaces (provider/consumer tests) C C I R I C I A R
14) Cost & FinOps (build mins, env hours, queries) A R R I C I C C I
15) Hiring & onboarding (engineering) C A C R I I I C R
16) Service catalog hygiene I A I R I C I I R
17) Paved‑road adoption I A R R I I I C R

RACI key: A = Accountable (single owner); R = Responsible (executes); C = Consulted (two‑way input); I = Informed (kept in the loop).

How to use this

Glossary

Alphabetical; concise definitions tuned to the context of the guide.

A

B

C

D

F

G

H

I

K

M

N

O

P

R

S

T

V

W

Y


Notes