Maintaining data consistency
Services must be loosely coupled so that they can be developed, deployed, and scaled independently. They of course, need to communicate, but they are independent of each other. They, have well defined interfaces and encapsulate implementation details. But what about data? In the real world and in non-trivial applications (and microservice applications will probably be non-trivial), business transactions must often span multiple services. If you, for example, create a banking application, before you execute the customer's money transfer order, you need to ensure that it will not exceed his account balance. The single database that comes with a monolith application gives us a lot of convenience: atomic transactions, a single place to look for data, and so on.
On the other hand, in the microservices world, different services need to be independent. This also means that they can have different data storage requirements. For some services, it will be a relational database, others might need a document database such as MongoDB, which is good at storing complex, unstructured data.
So, when building microservices and thus splitting up our database into multiple smaller databases, how do we manage these challenges? We have also said that services should own their data. That is, every microservice should depend only on its own database. The service's database is effectively part of the implementation of that service. This leads to quite an interesting challenge when designing the microservices architecture. As Martin Fowler says in his Microservice trade-offs column: Maintaining strong consistency is extremely difficult for a distributed system, which means everyone has to manage eventual consistency. How do we deal with this? Well, it's all about boundaries.
Microservices should have clearly defined responsibilities and boundaries.
Microservices need to be grouped according to their business domain. Also, in practice, you will need to design your microservices in such a way that they cannot directly connect to a database owned by another service. The loose coupling means microservices should expose clear API interfaces that model the data and access patterns related to this data. They must stick to those interfaces, when changes are necessary, you will probably introduce a versioning mechanism and create another version of the microservice. You could use a publish/subscribe pattern to dispatch events from one microservice to be processed by others, as you can see in the following diagram:
The publish/subscribe mechanism you would want to use should provide retry and rollback features for the event processing. In a publish/subscribe scenario, the service that modifies or generates the data allows other services to subscribe to events. The subscribed services receive the event saying that the data has been modified. It's often the case that the event contains the data that has been modified. Of course, the event publish/subscribe pattern can be used not only in relation to data changes, it can be used as a generic communication mechanism between services. This is a simple and effective approach but it has a downside, there is a possibility to lose an event.
When creating distributed applications, you may want to consider that there will be data inconsistency for some amount of time. When an application changes data items on one machine, that change needs to be propagated to the other replicas. Since the change propagation is not instant, there's a time interval during which some of the copies will have the most recent change, but others won't. However, the change will be propagated to all the copies, eventually. That's why this is called eventual consistency. Your services would need to assume that the data will be in an inconsistent state for a while and need to deal with the situation by using the data as is, postponing the operation, or even ignoring certain pieces of data.
As you can see, there are a lot of challenges, but also a lot of advantages behind using microservices architecture. You should be warned, though, there are more challenges we need to address. As services are independent of each other, they can be implemented in different programming languages. This means the deployment process of each may vary: it will be totally different for a Java web application and for a node.js application. This can make the deployment to a server complex. This is precisely the point where Docker comes to the rescue.