Datastores for Asynchronous Communication
Discover the datastores we can use for asynchronous communication.
We'll cover the following
In the previous lesson we learned that to achieve asynchronous communication, we need to store requests in datastores, and these datastores belong to two main categories: message queues and event logs.
Message Queues
Some commonly used message queues are ActiveMQ, RabbitMQ, and Amazon SQS.
A message queue usually operates with first-in, first-out (FIFO) semantics. This means that messages, in general, are added to the tail of the queue and removed from the head.
Note: Multiple producers can send messages to the queue, and multiple consumers can receive messages and process them concurrently.
Depending on how a message queue is configured, messages can be deleted in one of the following two cases.
- As soon as they are delivered to a consumer.
- Only after a consumer has explicitly acknowledged, it has successfully processed a message.
The former essentially provides at-most-once guarantees since a consumer might fail after receiving a message but before acknowledging it. The latter can provide at-least-once semantics since at least a single consumer must have processed a message before it is removed from the queue.
Coping with failed consumers
Most message queues contain a timeout logic on unacknowledged messages to cope with failed consumers ensuring liveness.
Note: For example, Amazon SQS achieves that using a per-message visibility timeout. At the same time ActiveMQ and RabbitMQ rely on a connection to timeout to redeliver all the unacknowledged messages of this connection.
This means that unacknowledged messages are being put back to the queue and redelivered to a new consumer. Consequently, there are cases where a message might be delivered more than once to multiple consumers. The application is responsible for converting the at-least-once delivery semantics to exactly-once processing semantics. As we have learned already, a typical way to achieve this is by associating every operation with a unique identifier that is used to deduplicate operations originating from the same message.
Note: The side-effects from the operation and the storage of the unique identifier must be done atomically to guarantee exactly-once processing. A simple way to do this is to store both in the same datastore using a transaction.
The following illustration shows a practical example of exactly-once processing through deduplication:
Event log
An event log provides a slightly different abstraction than a message queue. Messages are still inserted by producers at the tail of the log and stored in an ordered fashion. However, the consumers are free to select the point of the log they want to consume messages from, which is not necessarily the head.
Messages are typically associated with an index that consumers can use to declare where they want to consume from.
Another difference with message queues is that the log is typically immutable, so messages are not removed after they are processed. Instead, a garbage collection runs periodically, which removes old messages from the head of the log. The consumers are responsible for keeping track of an offset indicating the part of the log they have already consumed in order to avoid processing the same message twice, thus achieving exactly-once processing semantics. This offset plays the role of the unique identifier for each message similarly as described previously.
Some examples of event logs are Apache Kafka, Amazon Kinesis, and Azure Event Hubs.
Get hands-on with 1400+ tech skills courses.