In the world where all of the metrics are available to be fetch and tracked, we end up on too many things being measured or worst, too little things that are being measured. It is impractical to make smart decisions based upon all available data and impossible to make any decision without data, and virtually impossible to make every metric as a priority worthy of improvement. The first challenge is deciding on what to measure, this article is intended to propose following metrics as the de jure metrics that being tracked and constantly improved going forward within tech team that I led so far.
|Tech & Infrastructure||Improve and maintain System Availability1 and Reliability (MTTR2)|
|People & Organization||Improve Employee Engagement3 & Reduce Churn Rate4|
|Observability & Security||Increase observability on monitoring (Dashboard)5, alerting (Business and Engineering Metrics6), and protect customers from security vulnerabilities (Security Tickets7)|
|Productivity||Improve and maintain predictability on the sprint (Sprint Velocity8) and product quality (number of Bugs9)|
1 System Availability by pinging the health check endpoint, the secondary metric would be Apdex (Application Performance Index), my favorite tools would be either Elastic APM or Datadog
2 MTTR, mean time to recover tracked through technical support ticket and post mortem chronologies
3 Employee Engagement tracked by the people operation / HR team
4 Churn Rate tracked by the people operation / HR team
5 Dashboard tracked in single monitoring tools: my favorite would be either ELK stack or Datadog
6 Engineering metrics are pushed to Datadog from AWS CloudWatch, agent on the application instances, and various integrations, Business Metrics also pushed as custom metrics. Logs are are pushed to centralized logging tools. Alert needs to be actionable and all critical alerts need to be a push to PagerDuty.
7 Security Tickets, created through the following sources: Black Box (Bug Bounty, Vulnerability Scanner Tools, and Penetration Testing) and White Box (Static Code Analysis Tools)
8 Sprint velocity, tracked by Technical Program Manager along with other productivity metrics
9 Number of Bugs is bugs that are produced during a sprint, comes with granular metric including Number of Bugs back to Dev, tracked by Technical Program Manager / Scrum Master