Challenge
Enterprise integrations often demand distributed message processing. Responsibilities are distributed among systems that handle messages at different processing stages. Integration often involved message queueing technologies that provide quality of services and decouples systems. As with all distributed technologies there are eight fallacies to it that need to taken care off. This could produce duplicate messages, or even requestor by mistake could send two messages with same information. Take an example of order processing system that accepts orders from different channel applications in xml format. The order request is made up of orderItems, customer information, coupon discount reference. The system need to ensure the orders are not using the same coupon discount reference more than once. There could be various reasons why duplicate orders could be send to system. It could be network failure, system faults, customer entering it twice etc. What strategies do we have to handle this challenge?Solution option
There are different strategies depending on the way messages are processed. Whether we have intgeration platform that provides support to handle duplicate message through configurations or one has simple applications servers running in clusters and handling messages in parallel.
The IT landscape we are in also makes a difference, do we have control over the producers of messages and can we impose restrictions on how they produce messages. Its good practice not to design with specific client as consumer of the services or capability. Idempotent consumer is an Integration pattern defined and used in many scenarios. The solution has three parts to it
- How to identify unique messages?
- Maintaining history of message uniqueness
- Transactional integrity of consumer.
There need to be a way to recognize the message identity from business sematic point of view. The business context to which the message belongs need to have clear identifier that says this is unique to message per client or across clients. It could be single field/element or combination of fields that may define the uniqueness.
Obviously the next step is to think about how to compare the current message with past messages to determine duplicity. Storing history information could be in different mechanism. One could store uniqueness identifier in in-memory cache for limited time to improve performance. Other way could be to store identifier in database table with constraints of uniquness on those fields. This would naturally throw exception while insertion. There could be separate table with unique column that stores hashed value. The hash value is derived from the combination of business semantic message identifier. In the case of parallel consumers running in cluster, there could be issue of parallel processing of messages that are duplicate and if you have long transactions with steps like webservice calls that may not participate in global transactions, you may end up in rollback of second transaction but with the damage due to processing the webservice.
Hence, it does make sense to use a independent endpoint in the processing flow that could filter duplicate messages and allows unique messages forward.
Can the broker infrastructure take this responsibility? It could maintain a realtime cache of messages currently processed and will have to compare current message from the cached ones to identify duplicate. Let the application simply check the application database for duplicates in normal query. Some modern messaging systems like HornetQ do give support.
Depending on the architecture style one has there are different ways to handle the duplicates. Having and ESB style will help to isolate this concern outside the application and within ESB infratructure. If one is in clustered application server environment like websphere, one could use MDBs as endpoints and queues to write Idempotent message consumer that filters duplicate messages.