Engineering North Star Metrics

In the world where all of the metrics are available to be fetch and tracked, we end up on too many things being measured or worst, too little things that are being measured. It is impractical to make smart decisions based upon all available data and impossible to make any decision without data, and virtually impossible to make every metric as a priority worthy of improvement. The first challenge is deciding on what to measure, this article is intended to propose following metrics as the de jure metrics that being tracked and constantly improved going forward within tech team that I led so far.

Objective	Key Results
Tech & Infrastructure	Improve and maintain System Availability¹ and Reliability (MTTR²)
People & Organization	Improve Employee Engagement³ & Reduce Churn Rate⁴
Observability & Security	Increase observability on monitoring (Dashboard)⁵, alerting (Business and Engineering Metrics⁶), and protect customers from security vulnerabilities (Security Tickets⁷)
Productivity	Improve and maintain predictability on the sprint (Sprint Velocity⁸) and product quality (number of Bugs⁹)

¹System Availability by pinging the health check endpoint, the secondary metric would be Apdex (Application Performance Index), my favorite tools would be either Elastic APM or Datadog

²MTTR, mean time to recover tracked through technical support ticket and post mortem chronologies

³Employee Engagement tracked by the people operation / HR team

⁴Churn Rate tracked by the people operation / HR team

⁵Dashboard tracked in single monitoring tools: my favorite would be either ELK stack or Datadog

⁶ Engineering metrics are pushed to Datadog from AWS CloudWatch, agent on the application instances, and various integrations, Business Metrics also pushed as custom metrics. Logs are are pushed to centralized logging tools. Alert needs to be actionable and all critical alerts need to be a push to PagerDuty.

⁷ Security Tickets, created through the following sources: Black Box (Bug Bounty, Vulnerability Scanner Tools, and Penetration Testing) and White Box (Static Code Analysis Tools)

⁸Sprint velocity, tracked by Technical Program Manager along with other productivity metrics

⁹ Number of Bugs is bugs that are produced during a sprint, comes with granular metric including Number of Bugs back to Dev, tracked by Technical Program Manager / Scrum Master