Managing technical debt is never an easy game. Even if we have ideas, we need to compare the impact with the product initiatives/feature that we need to build to ensure we stay strong on the product side and boost the business. So what to do? lo and behold : Diverge-Convergence, comes as one of the strategies.
Continue readingPost Category → Operational
On Observability
Running an application without having a proper monitoring is akin to driving without a dashboard. You don’t really know if you still have enough gas, or if you are within the speed limit, or how far are you till your next oil change. There are many uncertainties involved in running an application. Monitoring is instrumental in getting first hand awareness on possible incident or help predict that an incident is about to happen so we can prevent it.
This post outlines some observables that we can monitor and setup alert for along with some recommended practice.
Continue readingSoftware Fragmentation – The Golden Path
There is a direct correlation between teams that give their engineers autonomy to own their technical decisions and the team’s ability to hire and retain A-class or Senior talent. There is a tradeoff, but an acceptable level of chaos in exchange for a stronger sense of individual/team ownership is usually the right one and leads to higher performing teams in the long run – at least this is what I’ve been seeing if a couple of companies in Indonesia.
So, how to make sure these “chaotic” things are manageable and actually give the benefit to the team?
Continue readingIncident & Post Mortem Process
Recurring incidents are the enemy of scalability. Recurring incidents steal time away from our teams – time that could be used to create new functionality and greater value. Our past performance is the best indicator we have of our future performance and our past performance is best described by the incidents we have experienced and the underlying problems that caused those incidents.
Failing to recognize and resolve our past problems means failing to learn from our past mistakes in either architecture, engineering, process, and operations, and also communication.
Continue reading