Today I helped a customer with a problem in their use of ActiveMQ where messages were getting backed up in a particular queue. It seems that messages in the queue would back up when a downstream process was not available. But this downstream process was only unavailable for a short period of time in the middle of the night for routine maintenance. However, this seems to have caused an ongoing problem for this customer.
We looked at the JVM memory, the broker memory and the queue memory configs each of which was kinda low so I recommended that it be increased (though this was not the problem). I also recommended setting org.apache.activemq.UseDedicatedTaskRunner=false to reduce the number of dispatcher threads being used internally by the broker and to consider the use of the NIO transport instead of the TCP transport to reduce the number of threads used for connections coming into the broker.
As we dug deeper, I looked at the number of consumers on the queue and found that there were 256 Apache Camel consumers in total, none of which was marked as being slow. The consumers were using the ActiveMQ PooledConnectionFactory with the default settings (1 connection and 500 sessions). The consumer prefetch limit was already set to 1 so I knew that this was not a problem. However, while looking at the settings for the connection, I noticed that there was a Camel RedeliveryPolicy set as well. Setting a redelivery policy is not a problem, but the values in it can be a problem if they are not verified.
The settings being used for the Camel RedeliveryPolicy included the following:
- maximumRedeliveries=-1: This means that redelivery attempts will continue forever (i.e., infinite)
- useExponentialBackOff=true: This means that the upon each successive delivery attempt, the amount of delay used will double (because the backoffMultiplier property is set to 2 by default)
- maximumRedeliveryDelay=36000000: This means that the maximum amount of delay that can occur for a given message is 36000000 milliseconds -- yikes that's high! It wasn't until I actually did the math that I saw that 36000000 milliseconds = 10 hours!!!
I then offered the following advice on dealing with the problem.
- Using JMX, move the messages from the current queue to a different queue temporarily (let's refer to this different queue as the sandbox1 queue). Make sure that there are no consumers on the sandbox1 queue so you can work on them without a worry that they will be consumed right out from under you.
- Using Camel, consume the message
- Using Camel, copy the body and any headers to a new message, but skip the JMSRedelivered header. You may also need to avoid the JMSmessageID header as well since I think it is used to track a message for redelivery. By not copying the JMSRedelivered header on each message it will be stripped.
- Using Camel, put the message into sandbox2 queue.
- Manually check each one to make sure that the JMSRedelivered header is actually empty/false.
- Using JMX, move the messages back to the original queue for the correct consumption and processing.