My question is about the high availability of the solutions.
We are told to design microserevices to fail fast in order to be easily restartable for a greater robustness of our solution.
My biggest experience is with Java EE platforms, where the application servers deals with the high availability and restarting of failing processes/threads.
I ask hence if you discuss how to implement a rubust solution, highly available to an enterprise grade
I was a J2EE developer for years and have a lot of thoughts on this space.
1. Microservices by themselves can make an application more resilient. Often times J2EE became extremely bloated and deployed as one single artifact to a cluster of servers. In the worse cases, I would see multiple applications deployed within the same container. A microservice is meant to be extremely small in size and the container is often packaged right within the microservice. In my experience launching a microservice takes seconds versus minutes in some of the more "enterprise" containers."
2. Service discovery makes microservices extremely resilient. Most service discovery engines will include a health check monitoring to detect when a service instance becomes unresponsive and start routing traffic to other service instances. If you combine this with container deployment technologies like Docker Swarm, Kubernetes or event Amazon's Autoscaling Groups when a service fails its health check, a new service instance can be started automatically. In my organization, we heavily leverage AWS autoscaling groups so services are automatically restarted if they go down and on several occasions we have scaled up automatically when other parts of our application have gone down and we have needed to scale up service instances to "catch" up on the load when we are back up.
3. While microservice applications are complex because of all of the distributed components in the application they are naturally resilient because failure of one service is not likely to cause problems throughout the application. We actually
use Netflix's Chaos Monkey to randomly kill our service instances to help identify and drive out dependencies that can cause entire outages.
I hope that answers your questions.
John Carnell - Senior Engineer, Genesys PureCloud Division
Author of Spring Microservices in Action