Engineering North Star Metrics

In the world where all of the metrics are available to be fetch and tracked, we end up on too many things being measured or worst, too little things that are being measured. It is impractical to make smart decisions based upon all available data and impossible to make any decision without data, and virtually impossible to make every metric as a priority worthy of improvement. The first challenge is deciding on what to measure, this article is intended to propose following metrics as the de jure metrics that being tracked and constantly improved going forward within tech team that I led so far.

The 4 Layers of A Team
ObjectiveKey Results
Tech & InfrastructureImprove and maintain System Availability1 and Reliability (MTTR2)
People & OrganizationImprove Employee Engagement3 & Reduce Churn Rate4
Observability & SecurityIncrease observability on monitoring (Dashboard)5, alerting (Business and Engineering Metrics6), and protect customers from security vulnerabilities (Security Tickets7)
ProductivityImprove and maintain predictability on the sprint (Sprint Velocity8) and product quality (number of Bugs9)

1 System Availability by pinging the health check endpoint, the secondary metric would be Apdex (Application Performance Index), my favorite tools would be either Elastic APM or Datadog

2 MTTR, mean time to recover tracked through technical support ticket and post mortem chronologies

3 Employee Engagement tracked by the people operation / HR team

4 Churn Rate tracked by the people operation / HR team

5 Dashboard tracked in single monitoring tools: my favorite would be either ELK stack or Datadog

6 Engineering metrics are pushed to Datadog from AWS CloudWatch, agent on the application instances, and various integrations, Business Metrics also pushed as custom metrics. Logs are are pushed to centralized logging tools. Alert needs to be actionable and all critical alerts need to be a push to PagerDuty.

7 Security Tickets, created through the following sources: Black Box (Bug Bounty, Vulnerability Scanner Tools, and Penetration Testing) and White Box (Static Code Analysis Tools)

8 Sprint velocity, tracked by Technical Program Manager along with other productivity metrics

9 Number of Bugs is bugs that are produced during a sprint, comes with granular metric including Number of Bugs back to Dev, tracked by Technical Program Manager / Scrum Master

Notify of
Inline Feedbacks
View all comments