At any tech company, we work with a lot of legacy systems and monoliths. As engineers, our first instinct would be to decouple these monolithic applications into microservices architectures so we can have cleaner code and an easier system to maintain. While this is definitely a good goal to have, sometimes we focus too much on the technical side of things (architecture, scalability, implementations) and lose sight of the bigger picture. Hopefully, this document can be guidance on other aspects we should think about.
Factors to consider
While working on Engineering Spec/RFC or thinking about designing an ideal system or service, please take into account the following factors. Hopefully, by thinking about these factors, we can view this as an opportunity to address more customer pain points and business problems.
A. Customer Impact
At the end of the day, everything we do should deliver value to our customers. I use the term customers very loosely here since it doesn’t always mean the end-user of the product. Our main goal should be to deliver as much customer (end-user) value as possible, but we can extend the definition of “customers” to be:
- Customer support team
- Will it increase/ decrease the amount of customer contact?
- Risk / fraud / security team
- How will our new system increase/ decrease our risk profiles and security loopholes?
- Data team
- How does our system affect integrations with our data team
- Will it address/ potentially add data discrepancy issues
- Other developers
- How can this speed up development velocity
- Will this decrease the learning curve for new engineers
- How will this impact integration with other team using our service
- How do we start and make sure we are delivering the right customer values? One angle to think about this is to start with a list of known problems.
B. Known Problems
What are some of the known problems we are trying to solve? Try to list as many problems as we can. Once we have the list of problems, try to use five whys technique to get to the root cause of these problems. We don’t want to fix the symptoms, we want to fix the underlying issue causing all of these common problems. Note that we don’t have to fix all the problems we can list. We can pick some problems that are relevant for our domain to fix and make notes that we are not solving some problems outside of our domain.
C. Success Criteria/ Metrics
What does success for this project look like? Are there some metrics that we can use to measure the success of the new system? A good starting point will be some technical metrics (monitoring dashboards, numbers of requests/ sec, latency, etc.). Even better metrics will be business metrics, basically on how this project can contribute to business objectives.
- Try to gather thoughts and perspective from as many people as possible
- Talk to other people that might be facing/ interested in similar problems. Other engineering teams, product team, data team, etc.
- Hold a brainstorming session early
- Early is better since it can help the way we are approaching the problem
- Getting feedback early is better than getting feedback after we have spent weeks thinking about design and finishing the RFC
- Get clear boundaries
- What problems this service is trying to solve
- What problems this service is not trying to solve
- Can we model the problem and find a solution instead of building a solution working for a specific use case
- Think about extensibility and compatibility
- How can we share new things we added in the new service
- Will the new service be compatible with existing integrations
- Often times, there will be multiple approaches or alternatives to solve the problem.
- Use the decision making framework ProACT to compare your options and make sure to choose the right solution to the problem.
- New service, same problems
- We only “move” the existing problems to the new service
- The new system is not solving any problems
- No clear domain boundaries
- Try to fix too many problems, potentially will become a new problematic monolith system
- Try to fix too few problems, potentially will become an overhead since we will have so many services
- Can we build a solution for a more generic use case?
- Solving only for “my team” use case
- We can design the best solution to solve our team problem, but this might not be the best solution for the entire ecosystem
- Talk to other stakeholders/ interested parties to mitigate this problem
- Building everything new and not reuse any existing solution