Outbox Pattern: Ensuring data consistency and reliable messaging in distributed architectures

In the development of distributed systems, especially in architectures based on microservices, problems related to data consistency and the reliable delivery of messages between services arise.

The Outbox pattern offers a solution that ensures the creations, updates, and deletions of aggregates in databases, along with the sending of messages to other services, maintaining the integrity and consistency of data across multiple services through eventual consistency.

Key Concepts

Distributed Transactions

Distributed transactions involve multiple operations that must be atomic but occur in a distributed environment. Distributed environments are not precisely characterized by their atomicity, so implementing distributed transactions introduces a series of challenges and problems. The Outbox pattern provides a solution by allowing transactions to be handled locally within a service and then propagating the effects of these transactions to other services asynchronously.

ACID vs. BASE

The ACID principles (Atomicity, Consistency, Isolation, and Durability) are fundamental in the context of traditional databases. However, the very philosophy of distributed systems makes it impossible to use these principles. Instead, we must adopt a more lax and therefore more complex approach: the BASE principles.

The BASE principles (Basically Available, Soft state, and Eventual consistency) adopt a more lax approach to consistency, giving more importance to availability and fault tolerance. The Outbox pattern aligns with the BASE approach, promoting eventual consistency and facilitating more resilient and scalable systems.

Eventual Consistency

Eventual consistency recognizes that systems do not need to be consistent at all times. Instead, it is guaranteed that they will reach a consistent state at some future point. This approach is particularly relevant to the Outbox pattern, as it allows database updates and messages sent to other services to be handled asynchronously, reducing the need for blocking operations and improving the scalability of the system.

So, what is the Outbox pattern?

The Outbox pattern is based on the use of an auxiliary database table (commonly called Outbox) dedicated solely to the temporary storage of messages (usually events) that must be sent to other services. This allows leveraging the ACID properties of a service, ensuring that both database operations and the generation of messages to notify these changes to other services are performed in the same database transaction. The auxiliary table acts as an intermediary to ensure that modifications cannot be made without issuing messages or vice versa.

The typical flow of the Outbox pattern begins with a business operation that requires both a database update and notification to other services of these changes. Instead of sending the message directly to the recipient services, the operation records the event in the auxiliary table, within the same transaction of the data update. This ensures that the business operation and the event creation are atomic.

In the background, there is a process that monitors the auxiliary table for new messages to issue. When there are messages in the table, it extracts and sends them to the relevant services, either by making calls to the APIs or through queue systems like RabbitMQ or Kafka. Once the process confirms that the message has been delivered, it can mark the message as processed or directly delete it from the auxiliary table.

schema.webp

Benefits of the Outbox Pattern

Reliable Message Delivery

Ensures that messages are not lost, even when any of the systems (database, background process, messaging systems, or other services) may fail.

Data Consistency

Ensures that modifications to the database are carried out along with the messages that must be sent to notify such changes.

Decoupling

Related to the first point, it offers decoupling between the different components involved in the process of production, issuance, or consumption of messages, decoupling them and making the system more fault-tolerant without affecting consistency.

Challenges and Issues of the Outbox Pattern

Complexity

The implementation of the Outbox pattern introduces additional complexity to the system, requiring careful management of the auxiliary table and event processing.

Performance

The auxiliary table can represent a bottleneck in systems with a high volume of messages. It is also necessary to consider that if our use cases are very large in terms of volume of updates and messages generated, this can increase the transaction time and produce a cascading effect that causes database saturation.

Implementation

The Outbox pattern can be implemented using a Dispatcher over an EventBus. Dispatchers can help to add more functionality to the EventBus without having to modify its implementation.

implementation.png

In the case of the previous image, the EventBus injects two dispatchers, one to persist the event in the EventStore and another to apply the Outbox pattern by persisting the event in the auxiliary table. This implementation allows us to be very flexible and, if necessary, replace the Outbox pattern dispatcher with one that publishes directly on RabbitMQ.

implementation-rabbitmq.png

Scalability Improvement

If we work on a system with a high volume of data, it will be necessary to implement or take into account some strategies so that the auxiliary table does not become a bottleneck. In this section, we develop some strategies, although creativity can be very useful to adapt the pattern to our specific needs.

Delete Consumed Messages

The auxiliary table stores events that need to be issued; however, once issued, the messages will no longer be used for anything else. In such a case, we can delete the messages we have already consumed and optimize space. This will increase the database load by performing deletion operations, but in the long run, it will prevent problems due to having a table that is too large. The extra cost is the difference between updating the records (to mark them as consumed) and deleting them.

Horizontally Scale Consumers

Adding more background processes to consume the pending messages is another option to improve scalability. However, several things must be considered:

  • Messages might be published out of order.
  • A message might be published multiple times.

Both problems can occur without horizontally scaling the consumers, but the probability increases when adding them.

Batch Processing

Consuming messages one by one from the auxiliary table is the first step towards creating a very quick bottleneck. The most efficient way is to retrieve and delete messages in batches. However, it should be noted that the larger the batch, the more likely it is that something goes wrong, and we cannot confirm that the messages have been consumed correctly.

Additionally, the larger the batch, the longer it takes to process, and the greater the likelihood that another consumer retrieves the same batch of messages from the database.

On the other hand, it’s worth considering that a percentage of the messages in the batch may not be successfully delivered. In such cases, we should delete the messages that we have been able to deliver but not those that have failed.

Query Optimization

Two queries are theoretically applied to the auxiliary table:

  • Searching for the X oldest messages in the table
  • Deleting by IDs

Optimizing these queries is very important. For example, MySQL (to mention one DBMS) can play tricks on us even with queries that seem very trivial. I strongly recommend dedicating half an hour of our time to these two queries to ensure that the table configuration and queries are optimized to the maximum.

Table Partitioning

If our system produces many messages, there will come a point when it is inevitable to partition the auxiliary table. This allows us to distribute messages among several instances, which can in turn be consumed by various consumers applying all the mentioned optimizations.

It must be ensured that this solution is applied correctly, evenly distributing events among all instances to avoid potential bottlenecks.

Mixed Approach

The more data we retrieve from the database, the longer the queries take to execute. Therefore, a mixed approach consists of publishing messages on an auxiliary queue system and not delivering them to other services until the message ID is in the auxiliary table. Thus, we avoid writing too much data in the database, and the message is not delivered to other services until the ID is in the auxiliary table.

Conclusion

The Outbox pattern is a robust solution for secure message delivery in distributed systems. Employing eventual consistency and an auxiliary table, the pattern utilizes the ACID properties of each service to prevent inconsistencies between database changes and notification messages to other services.

In its implementation, it is necessary to pay special attention to performance, as bottlenecks may arise when working with high volumes of data. In such cases, applying scalability strategies such as mixed approaches or partitioning the table can help improve performance.