26 October 2010

New Features in ActiveMQ 5.4: Automatic Cluster Update and Rebalance



Apache ActiveMQ 5.4.0 was released in August, followed quickly in September by a 5.4.1 release. Not only were there tons of fixes in these releases, but there were also some really great new features including message scheduling, support for web sockets, new Unix control scripts, full message priority, producer message caching and cluster client updates and cluster client rebalancing just to name a few. In this blog post, I'm going to discuss the new cluster client updates and cluster client rebalancing features so that you get a taste of how they are used.

Problem Introduction


When using a network of brokers with ActiveMQ, the configuration of the brokers that form the network has always been rather static. The first step toward a more dynamic network of brokers was a feature I presented in a previous blog post titled How to Use Automatic Failover In an ActiveMQ Network of Brokers. In the event of a broker connection failure, the use of the failover transport for the network connections between brokers allows those connections to be automatically reestablished. This is a wonderful feature for sure, but it only got us part of the way toward a truly dynamic cluster of brokers.

Two new features in ActiveMQ 5.4 introduce the concept of making the cluster of brokers even more dynamic. These two items are the ability to:
  • Update the clients in the cluster
  • Rebalance the brokers in the cluster
Both of these features are quite interesting so I will explain how each one works.

Update Cluster Clients


In the past, when clients connected to brokers in the cluster, it was recommended to keep a comma-separated list of broker URIs in the failover transport configuration. Below is an example of this style of configuration:

failover:(tcp://machineA:61616,tcp://machineB:61616,tcp://machineC:61616)?randomize=false

The failover configuration example above lives on the client side and contains a static list of the URIs for each broker in the cluster. In the event that a broker in the cluster fails, the failover transport is what allows a client to automatically reconnect to another broker in that list of broker URIs. Unfortunately this style of configuration can be difficult to maintain because it is static. If you want to add another broker to the cluster, every client's failover transport configuration must be updated manually. Depending on the number of clients in your cluster, this could really be a maintenance headache. This is where the first new features comes to the rescue.

ActiveMQ 5.4 provides the ability to automatically update the clients in the cluster. That is, if a new broker joins or leaves the existing network of brokers, the clients' failover transport configurations no longer need to be manipulated manually. Using configuration options on the broker, you can tell the broker to update each client's failover transport configuration automatically. Below is an example of this new feature:


<broker brokerName="brokerA" ...>
...
 <transportConnectors>
   <transportConnector name="tcp-connector" uri="tcp://192.168.0.23:61616" updateClusterClients="true" />
 </<transportConnectors>
...
</broker>


The configuration above is on the broker-side. Notice the new attribute in the <transportConnector> element named updateClusterClients=true. This attribute is used in conjunction with the failover transport on the client-side and it tells the broker to automatically update the client's failover transport configuration when the network topology changes. In addition to the updateClusterClients=true property, there are also a few others including:
  • updateClusterClientsOnRemove - Updates a client when brokers are removed from the cluster.
  • updateClusterFilter - A comma-separated list of regexes to match broker names that are part of the cluster. This allows flexibility for the inclusion/exclusion of brokers.
  • updateURIsURL - Used to provide the path to a file containing a comma-separted list of broker URIs.
These new features are extremely powerful because they allow for a much more dynamic network of brokers configuration. Anyone who has had to deal with the static nature of the failover transport configuration should understand the power in these new features and do some experimentation to see how they operate.

Rebalance Cluster Clients


The second new feature also builds upon the failover transport configuration, but for a slightly different purpose. Consider the fact that when a new broker is added to/removed from the cluster that clients cannot automatically take advantage of it. Even with the new ability to update the clients so that they have knowledge of the broker being added/removed, there was no way previously for them to actually use that broker unless a failure occurred. Well that's what this feature does.

ActiveMQ 5.4 allows clients to be automatically disconnected from their current broker and reconnect to a different broker. Here's an example to illustrate this feature. Let's say you have a cluster of three brokers: brokerA, brokerB and brokerC, each of which has some clients connected. When a new broker is added to the cluster, if the updateClusterClients property is set to true, then the clients will be notified about the new broker, but no action will be taken unless the rebalanceClusterClients property is set to true. When the rebalanceClusterClients property is set to true, the clients will be automatically be disconnected from their current broker in order to reconnect to another broker in the cluster. Below is an example configuration for the new rebalance property:


<broker brokerName="brokerA" ...>
...
 <transportConnectors>
   <transportConnector name="tcp-connector" uri="tcp://192.168.0.23:61616" updateClusterClients="true" rebalanceClusterClients="true" />
 </<transportConnectors>
...
</broker>


Notice the new rebalanceClusterClients attribute in the <transportConnector> element. This property enables the clients to immediately take advantage of the new broker in the cluster. Instead of waiting for the next connection failure and a reconnect from the failover transport, the clients are told to reconnect immediately to another broker in their list.

Testing The New Features


Testing these two new features is pretty easy actually. Below are the steps I have used on a few occasions:

  1. Make sure that your clients are logging the broker URI to which they are connected for sending or receiving messages
  2. Configure each client to only have one broker URI in its failover transport configuration
  3. Configure the transport connector on the broker-side to set the updateClusterClients property to true and the rebalanceClusterClients property to true
  4. Start up the brokers in your cluster
  5. Start up the clients that connect to a broker in the cluster
  6. Add a new broker to the cluster and observe the following behavior:
Due to the two new properties that have been set on the broker-side, each client will be notified of the new broker that was added to the cluster AND each client will automatically reconnect. That is, the functionality of the failover transport will be engaged so that each client is disconnected from the current broker and reconnected to another broker in the list (i.e., the list of broker URIs in the failover transport configuration).

The fact that each client reconnects to a new broker tells you that:
  1. The updateClusterClients property is working correctly because you should see the logging change from one broker URI to another. Remember that each client was started with only one broker URI in their failover transport config. The fact that they are reconnecting tells you that they are receiving notifications of changes to the cluster.
  2. The rebalanceClusterClients property is working properly because the clients reconnected.
Verify this using the logging from each client. You will see that each client was sending or receiving messages to/from one broker URI and suddenly the logging changes to another broker URI. This tells you that the clients are being updated and rebalanced.

Conclusion


These new features are quite powerful additions to the ActiveMQ network of brokers. They really advance ActiveMQ beyond the static configurations upon which we have all relied for many years now. Most likely the sys admins and dev ops folks will enjoy these features the most because they will no longer need to manually manage a static list of broker URIs for clients.

As I said earlier, many other great features were also introduced in ActiveMQ 5.4 and 5.4.1. So try them out yourself to see if they help to improve your application development.

Update: If you only set the updateClusterClients="true" and the rebalanceClusterClients="true" options, you will notice that when a broker in the network fails and is brought back up, the client connections to other brokers in the network are not automatically rebalanced. This is due to the lack of the updateClusterClientsOnRemove="true" option. After adding this option to the config, network broker clients are notified of broker failures which basically completes the circle and allows the automatic rebalancing to work as it should.

44 comments:

  1. Bruce, great post.

    It got me thinking about a potential solution to two limitations with our existing Master / Slave implementation, kahadb stability and infinite / perpetual JMS client failover.

    While the clients successfully failover to the promoted slave, they will not fail back to the original master once restored (services cannot failback to the original master when the server is restored).

    This is a problem for production applications which require high availability as server patching other other server failure will create this scenario, requiring the restart of both servers and clients.

    Would a network of brokers with use of rebalancing solve this?

    An example to illustrate the problem. Consider two AMQ JMS servers, broker01 and broker02, and 10 JMS clients, client01 - client10.

    1) Patching for odd nodes occurs on the first of the month. broker01 is shutdown and is removed from the URI list (updateClusterClientsOnRemove).

    2) Use of the rebalanceClusterClients option should force re-balancing of the JMS clients to broker02.

    3) After server patching is complete, broker01 is restarted and re-registered in the URI list (with updateClusterClients option).

    4) The rebalanceClusterClients should distribute some clients back to broker01.

    5) On the 3rd week of the month the same patching process occurs for even numbered servers and the process repeats for broker 02.

    Three questions:

    a) Will this essentially allow perpetual failover of JMS clients?

    b) is the use of rebalanceClusterClients required for failover? From a logging perspective, and to reduce chances of duplicate message processing, it is preferable in our case if the clients are connected to one server at a time (the master/slave configuration is not yet fully mature, resulting in reliability issues unless reverting to a DB datastore).

    c) i see a number of options for message store replication between non master/slave brokers in a network of brokers, how does one avoid message processing duplication?

    Many thanks in advance.

    ReplyDelete
  2. @ives, Below are answers to your questions:

    > a) Will this essentially allow perpetual failover of JMS clients?

    Yes, I suppose it would allow this because each broker would be removed and re-added and each clients' list of broker URIs updated.

    > b) is the use of rebalanceClusterClients required for failover? From a
    > logging perspective, and to reduce chances of duplicate message
    > processing, it is preferable in our case if the clients are connected to
    > one server at a time (the master/slave configuration is not yet fully
    > mature, resulting in reliability issues unless reverting to a DB
    > datastore).

    Well it's not required for failover. However, as you noted, the limitation of failback can be mitigated to a certain degree.

    > c) i see a number of options for message store replication between
    > non master/slave brokers in a network of brokers, how does one
    > avoid message processing duplication?

    The only way to avoid duplicate messages is by manually performing a duplicate check as part of message processing. I.e., before processing each message, check for the message id in a database table; if the id does not exist, add the id to the table and process the message; the id already exists, skip the message.

    ReplyDelete
  3. Hi Bruce

    We are using the Master/Slave approach for HA (Not the Network of Brokers)

    Our Web client (which is running on Tomcat) is using failover url to connect to our 3 Brokers.
    failover:(tcp://host1:61616,tcp://host2:61616,tcp://host3:61616)?randomize=false&timeout=30000

    Also we are not adding/deleting any brokers dynamically. We are using those 3 defined in the failover url.

    What we noticed was when a Master broker dies at some point, One of the slaves Comes up as a Master with out any problems.

    But the web client blocks sending the messages. It is not recognizing the switchover.

    Seems failover transport is not working.
    Are these new features could help us in any ways?

    Thanks
    Krishna

    ReplyDelete
  4. @Krishna, The new features I blogged about here to update the cluster clients won't have any impact on the problems you are seeing.

    It sounds to me like failover transport needs to be tuned a it. The failover transport doc outlines the available options. Two options that you may want to adjust include:

    * initialReconnectDelay - How long to wait in milliseconds before the transport makes the first attempt to reconnect. The default is 10 milliseconds but you may need to increase it.

    * maxReconnectAttempts - The max number of reconnect attempts before sending an error to the client. The default is 0 so you may need to increase this to a more reasonable number.

    You may need to tune the failover transport beyond just these two options. So take a look at the other options and do some experimentation to figure it out.

    ReplyDelete
  5. Bruce,

    I have some problems with fail-over too and I am hoping to find a solution/ areas that I can troubleshoot.

    In one of the applications, we use just one ActiveMQ server and JBOSS server is JMS client. ActiveMQ is 5.3 server. I am using failover option in uri and failover works ok but has some problems. When a application screen sends out a message, I want it to timeout after 5 sec if broker is not available. That does not work no matter how many different combinations of paramters I have used. My application seems to wait there for ever until I bring up activemq again at which time it connects to activemq and sends out that waiting message. This is a problem in my application and I want it timeout immediately.


    Application is using jencks jca connector and it's factory classes .. I am copying some configuration here.
























    Even though I am using timeout option , application is not timing out because of old JCA classes or is that It is timing out but I do not see that? I have run this whole scenario but I did not see any log showing that it is timed out.


    The other problem is I can not shutdown jboss server cleanly if failover is used. It just hangs there.

    ReplyDelete
  6. @Sam, I cannot see your configuration because Blogger strips out code. Please edit your XML config to substitute the < and > characters with HTML character entities and repost it. For example, the < symbol should be substituted with an ampersand (&), followed by 'lt', followed by a colon (;) and the > symbol should be substituted with an ampersand (&), followed by 'gt', followed by a colon (;). A quick search replace in a text editor makes this easy work. Also, make sure to preview your comment before posting so that you can see if it is displayed correctly.

    Are you embedding ActiveMQ inside of JBoss? If so, please post the ra.xml config file as well.

    Bruce

    ReplyDelete
  7. Bruce,

    Sorry about that I have replaced xml with character entities and pasting here again..hopefully this will show up correctly this time.

    ActiveMQ broker is not a VM but a standalone broker that runs on a windows box.

    Also to provide you more details on this, I had replaced my activemq jars on client side to the latest 5.4 (activemq_all-5.4.2.jar and jencks-2.0.jar ). So for activeMQ I have all latest version on both client and server side of activemq. JBoss is client. Spring is 2.0

    Is timeout parameter to failover not working be any how influenced by what JCA container we are using at all? I use a combination of JCA classes and Spring Default Messenger classes that utilize these factory classes.

    Could it be timeout is working but I am not using any associated listener like transportListener is why I am not catching this? Atleast in debug I did not see throw any exceptions (not sure if I have turned all debug options for me to see this throw timeout exception -- assuming it throws some exception and I have to catch it)


    <bean id="jmsResourceAdapter" class="org.apache.activemq.ra.ActiveMQResourceAdapter">
    <property name="serverUrl" value="failover:(tcp://myIPAddress:61618)?connection.sendTimeout=1000&randomize=false&timeout=3000&maxReconnectAttempts=10&maxReconnectDelay=300000&connnection.closeTimeout=10000&startupMaxReconnectAttempts=10"/>
    </bean>

    <bean id="jencksTransactionManager" class="org.jencks.factory.TransactionManagerFactoryBean"/>

    <bean id="connectionManager" class="org.jencks.factory.ConnectionManagerFactoryBean">
    <property name="transactionManager" ref="jencksTransactionManager"/>
    </bean>

    <bean id="jmsManagedConnectionFactory" class="org.apache.activemq.ra.ActiveMQManagedConnectionFactory">
    <property name="resourceAdapter" ref="jmsResourceAdapter"/>
    </bean>

    <bean id="connectionFactory" class="org.jencks.factory.ConnectionFactoryFactoryBean">
    <property name="managedConnectionFactory" ref="jmsManagedConnectionFactory"/>
    <property name="connectionManager" ref="connectionManager"/>
    </bean>

    ReplyDelete
  8. Bruce, just an update ,

    I tried to play with url options and see if that works..the corresponding failover class after all is taking the options I am passing along with URL.

    For example maxReconnectAttempts (fails to reconnect later), maxReconnectInterval ect.

    So it could be that timeout is working too (or may be not) -- what I am desiring if timeout is working on send then it throw some error so that I can catch it and use it it in my code.

    ReplyDelete
  9. @Sam, if the application deployed in JBoss is acting simply a client to an external, standalone ActiveMQ instance, there is no need to use the Jencks JCA resource adapter to connect to ActiveMQ -- especially if you are using Spring in your application. Since you already have Spring in your application, you can use Spring JMS to easily turn your application into a standard JMS consumer.

    If you're interested in this much simpler approach, take a look at my blog entry titled Using Spring to Receive JMS Messages.

    ReplyDelete
  10. Hi Bruce, great article. I'm enjoying activeMQ 5.3 for our production systems.

    One of the things that came up was the clean failover of brokers.

    In the Master-Slave configuration we have now, we use a DB. Each broker in our cluster accesses a single DB. The mechanism ActiveMQ seems to use is contention for a single table/row.

    What I found is that if the DB instance dies, and the master DB fails over to the slave instance, we basically have to restart all the brokers. (Even when using a virtualized IP)

    Does Version 5.4 solve this issue? There was a lot of mention about discovery of brokers, and broker failover, how about broker "persistence discovery"?

    ReplyDelete
  11. @anothermarkus, I believe what you describe is handled in ActiveMQ 5.3 and greater. See AMQ-2387 for more info.

    ReplyDelete
  12. Hi Bruce,

    Many many thanks for this great post. It is really helping me to upgrade to AMQ 5.4.2. Previously we used AMQ 5.3.0.

    I am facing a problem here. We use four brokers in a network. And when I am starting up the brokers and connecting to those brokers, the messages are published to and consumed from the queues fine. But when I am restarting a broker, the messages are published to that broker but not consumed any more. I created a test client to figure out the issue. The activemq.xml file looks like below.

    <broker
    xmlns="http://activemq.apache.org/schema/core"
    brokerName="broker-${hostname}-31380"

    ....
    ....

    <networkConnectors>
    <networkConnector
    name="${hostname}-mule-nc"
    dynamicOnly="true"
    duplex="true"
    networkTTL="4"
    uri="static://(tcp://server1:31380,tcp://server2:31380,tcp://server3:31380,tcp://server4:31380)"/>
    </networkConnectors>

    ....
    ....

    <transportConnectors>
    <transportConnector name="openwire" uri="tcp://0.0.0.0:31380?transport.closeAsync=false"
    updateClusterClients="true" rebalanceClusterClients="true" updateClusterClientsOnRemove="true"/>
    </transportConnectors>

    ...
    ...

    <broker>

    The ${hostname} values are server1, server2, server3 or server4 depending upon which server is starting up.

    The consumer client connects to the broker using URI "failover://tcp://server1:31380". When I start server1, the client log says "Successfully connected to tcp://server1:31380". But when I startup server2, server3 or server4, I don't see any reconnect attempt to those servers. So, is "updateClusterClients" not working? Is there anything that I should do? Please let me know if you need to know any further.

    Thanks,
    Bodhayan.

    ReplyDelete
  13. @Bodhayan, It's been a while since I configured this test so I can't remember the broker config for sure. Did you enable debug level logging so that you can see if there is an attempt by a broker's network transport to reconnect? That's where I would start. Also, are both your producers and your consumers using the failover transport to connect to the brokers?

    ReplyDelete
  14. Hi Bruce,

    Thanks for your reply. I tried enabling debug level logging but could not see any reconnect attempt.

    I saw similar issue in the Active MQ forum. The link is here.

    I am not seeing any solution to that issue.

    Yes, both my producer and consumer are using failover transport.

    One thing I just wanted to ask you here. What is the process you follow to add/remove a broker from a cluster?

    Thanks,
    Bodhayan.

    ReplyDelete
  15. @Bodhayan, For the purposes of automatic reconnection between brokers in a network, you might want to consider using the failover transport. Check out the blog post I wrote up about this very feature titled How to Use Automatic Failover In an ActiveMQ Network of Brokers. Use of the failover transport in broker networks makes the type of situation you're describing much easier to handle. In the blog post that I linked above, I outlined the steps I took for that scenario. Perhaps this will help you with your situation.

    ReplyDelete
  16. Hi Bruce,

    Sorry for the late reply. I was busy with some other thing.

    I tested the failover transport among the network of brokers. But not as I expected. I have got four brokers but only one is operating at a time. If that goes down, the producers and consumers are connected to another broker. So, its a big performance loss.

    At this point, I am just concerned about one thing. When I start all the brokers fresh and start the producers and consumers, they are perfectly load balanced. i.e., messages are enqueued to all the brokers and also consumed from all the brokers. But when I restart a broker, the problem arises. From the Active MQ admin page, I can see that messages are being enqueued to that restarted broker. But not consumed anymore. When I click against the "Active Consumers" link against any queue, I see that other brokers are connected to that broker like below;

    SERVER001-mule-nc_broker-SERVER001-31380_inbound_broker-SERVER003-31380
    ID:SERVER003-33547-1302884918863-7:2

    SERVER003-mule-nc_broker-SERVER001-31380_inbound_broker-SERVER003-31380
    ID:SERVER003-33547-1302884918863-2:2

    SERVER003-mule-nc_broker-SERVER002-31380_inbound_broker-SERVER003-31380
    ID:SERVER003-33547-1302884918863-3:2

    ...
    ...

    SERVER003-mule-nc_broker-SERVER004-31380_inbound_broker-SERVER003-31380
    ID:SERVER003-33547-1302884918863-5:2

    When I restart any server, suppose SERVER003, I see this in the SERVER003 log.

    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Started responder end of duplex bridge SERVER002-mule-nc@ID:SERVER002-59386-1302884047266-0:1
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Started responder end of duplex bridge SERVER001-mule-nc@ID:SERVER001-37480-1302884037047-0:1
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Network connection between vm://broker-SERVER003-31380#8 and tcp:///xx.x.xxx.xx:33542(broker-SERVER002-31380) has been established.
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Network connection between vm://broker-SERVER003-31380#10 and tcp:///xx.x.xxx.xx:53568(broker-SERVER001-31380) has been established.
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Started responder end of duplex bridge SERVER004-mule-nc@ID:SERVER004-57831-1302884800028-0:1
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Network connection between vm://broker-SERVER003-31380#12 and tcp:///xx.x.xxx.xx:36419(broker-SERVER004-31380) has been established.

    Also I see this in SERVER001 log

    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Establishing network connection from vm://broker-SERVER001-31380?async=false&network=true to tcp://SERVER003:31380
    INFO | jvm 1 | 2011/04/15 16:29:01 | INFO | Network connection between vm://broker-SERVER001-31380#74 and tcp://SERVER003/xx.xx.xx.xx:31380(broker-SERVER003-31380) has been established.

    I see the same in other server logs also.

    So, that means when I restart one broker, it is being connected to the network of other brokers. But somehow, the messages are being enqueued in that broker and not consumed anymore. Since there are consumers connected to other brokers, messages should flow through the network of brokers to those consumers, right? Is it a bug of Active MQ? Just not sure how to make it work. Please let me know what I should do.

    Thanks for your help.
    Bodhayan.

    ReplyDelete
  17. @Bodhayan, The problem that you are describing sounds like the either the networkTTL option in the broker config or a hard-coded rule inside the broker might be your problem. When you previously posted your broker config, I saw networkTTL="4". This tells the broker to limit the number of broker hops a message can make. Once a given message makes 4 total hops across brokers, it will be stuck where it lands and cannot be moved to another broker.

    Furthermore, there is a rule deep inside the broker's networking code that will not allow messages to be sent to a broker where they have already been. (This rule exists in the DemandForwardingBridgeSupport.suppressMessageDispatch() method where it creates a NetworkBridgeFilter for this purpose.) The way this works is that if a message goes to brokerA and is forwarded to brokerB, that message cannot ever go back to brokerA. The reason this rule was implemented years back was to prevent message ping-pong so that messages are not bouncing back and forth between brokers in a non-stop manner.

    It sounds to me like one of these two rules is getting your way, but I'm not sure which one without a bit of testing. To test this, just change the networkTTL to 50 and see if the behavior changes. If it does, then you know that the networkTTL is too low. If the behavior does not change, then it's the other rule and that cannot be changed without hacking the broker's code and changing this rule can yield some very dicey message forwarding behavior.

    Hope that helps.

    Bruce

    ReplyDelete
  18. Hi Bruce,

    No luck :(. I have changed the networkTTL to "50". But still after restarting a broker, messages are being enqueued to that broker but not being dequeued.

    I saw the same issue in the Active MQ forum. I replied on that too. But no response till now. Let me do some more testing on that.

    Thanks for your help.

    Bodhayan.

    ReplyDelete
  19. @Bodhayan:

    If you're using 5.4, it almost sounds like your consumers aren't being updated with changes to the active server list, nor rebalancing clients when a server joins the network.

    Take a look at the following to see if they apply to your configuration...

    conduitSubscriptions
    suppressDuplicateQueueSubscriptions
    advisorySupport
    updateClusterClients
    rebalanceClusterClients
    updateClusterClientsOnRemove
    updateClusterFilter

    more info...

    http://activemq.apache.org/networks-of-brokers.html

    http://activemq.apache.org/configuring-transports.html

    ReplyDelete
  20. @ives:

    I tried these options earlier. I tried today also. These are not working :(.

    Thanks,
    Bodhayan.

    ReplyDelete
  21. @Bodhayan, Please provide the full XML configuration files for all the brokers that you are using so that I can see how they are configured/networked. Instead of posting them here, please email them to me (bruce [DOT] snyder [AT] gmail [DOT] com). Setting up ActiveMQ network broker topologies can be difficult so sometimes it helps to have a second pair of eyes.

    Bruce

    ReplyDelete
  22. Hi Bruce

    Thanks for the post. I would like to have a dynamic cluster that the client (or brokers in another cluster) connect to. The approach recommended in the article seems to require that when a new client is introduced that it know the static connection information for at least one broker. But if brokers are coming and going at the same time as clients, it is not guaranteed that single broker will be up, in which case the client is stranded. If I (re)introduce a static list of failover clients, I end up back where I started. Is this a correct assessment, or is there another way of configuring it that decouples the client from specific broker instances in the same way that discovery does?

    ReplyDelete
  23. @Edward Ost...

    I've been wondering about this too. An interesting implementation option for Active MQ would be to support DNS service / resource record lookups.

    In effect this would work similarly to DNS MX records. A list of servers are returned to the client, any one of which are valid servers to initially connect to (and isn't static like the URI connection string). If the client fails to connect to one server, it would attempt to connect to the next in the list.

    Until that sort of option exists, I wonder if using front-end load balancing would work? (pointing the clients / brokers at a VIP)

    ReplyDelete
  24. @Edward, Your assessment is correct. A client needs to know the address of at least one broker in order to connect to the cluster. Once connected and the cluster rebalances, the client's list of broker URIs will be updated. Even if the broker to which a client is trying to connect is not currently available, with the correct configuration, the failover transport will automatically retry the connection until it is successful or until it times out. Then the dependency is on the ability to automatically restart brokers when they go down using some sort of watchdog process. For this I typically recommend the use of daemontools because it is a hidden gem and very reliable.

    At one time, I had plans surrounding this very problem to create an application that would have knowledge of which brokers were available, accept client connections on behalf of brokers and connect clients to an available broker. Unfortunately I could never make the time to complete the application, so to this day it remains nothing more than an idea.

    ReplyDelete
  25. @ives, Another solution I have seen to this problem is to use a load balancer in front of the brokers and have all clients connect to a virtual IP (VIP) address. When a client connects to the VIP address, the load balancer already has knowledge of which brokers are available and it forwards the request along to one of them. Load balancers also handle sticky sessions so that clients are not re-routed to different brokers unless absolutely necessary. I have seen this work very successfully in some very large environments.

    ReplyDelete
  26. Greate Post.


    is open for other broker with 192.168.0.23?
    or is open for client?
    or is open for both???
    --
    linzuxiong1988@gmail.com
    I need the answer.I am newer.
    Thanks!

    ReplyDelete
  27. <transportConnector uri="192.168.0.23" />

    ReplyDelete
  28. @linzuxiong, brokerA defines a transport on 192.168.0.23, so for that broker the TCP transport is available for connection from either a client or another message broker.

    ReplyDelete
  29. @Bruce Snyder
    activemq 5.5

    I known the TCP transport is available for either clients or brokers from activemq.apache.org.

    Now I have confirm transportConnector uri="192.168.0.23"
    in new-features-in-activemq-54-automatic is for both clients and brokers .

    So transportConnector with automatic is for both clients and brokers .

    --
    Thanks!

    ReplyDelete
  30. Hi Bruce, this article enlightened me as I'm using activemq 5.5 to achieve load balance. Thanks.

    There's something I don't understand, Here's the scenario:
    I opened port 61615 for brokers and port 61616 for clients in the transportConnector like this:






    both broker A and broker B are configured like above and the networkConnector is set to be duplex.

    and I have several clients connected to broker A using only A's uri in failover, and some connected to broker B using only B's uri in failover.

    When I start the two brokers, I can see some clients that should have connected to A were connected to B, that's fine, they find new broker and reconnect to it.
    then I stop broker A, all the clients were connected to B. That is good, too.

    but, I checked the web console and checked the connections of broker B, I saw some clients were connected to B through 61616 and some through 61615.
    So, I didn't really get it. you say clients will disconnect from their current broker and reconnect to a different broker. should the reconnect use clients' port 61616 or...? and I did see clients connect to broker B through port 61615 when broker A is down. I'm confused.

    Then I restart broker A, I didn't see clients reconnect to it, they were still on broker B.

    Then I stop broker B, I found only the clients that has A's uri in there failover were reconnected to A, others that has B's uri only were stuck. things didn't go as the first time.

    how did that happen? I'd appreciate if you could explain it.

    and plus...another strange thing:
    I'm using JAAS ,that works fine, and the is commented out by default. Then I add the , restart activemq, it didn't work! the log file did't record anything....
    It seems like JAAS and could't work together,that's strange....

    Thank you!

    ReplyDelete
  31. sorry the transportConnectors wasn't paste successfully
    < transportConnectors >
    < transportConnector name="openwire" uri="tcp://0.0.0.0:61616?closeAsync=true"
    updateClusterClients="true" rebalanceClusterClients="true" updateClusterClientsOnRemove="true" />
    < transportConnector name="brokerOpenwire" uri="tcp://0.0.0.0:61615?closeAsync=true"
    updateClusterClients="true" rebalanceClusterClients="true" updateClusterClientsOnRemove="true" />
    < /transportConnectors>

    ReplyDelete
  32. @dingzixuan, answers to your inquiries are below:

    > should the reconnect use clients' port 61616 or...?

    The reason that the clients are connecting on either port ist due to the fact that you have configured each message broker to expose tranports on both 61615 and 61616 and each transport to updateClusterClient=true. In other words, you told the broker to allow this, so it does.

    > Then I stop broker B, I found only the clients that has A's uri in there
    > failover were reconnected to A, others that has B's uri only were stuck.
    > things didn't go as the first time.
    >
    > how did that happen? I'd appreciate if you could explain it.

    This behavior does not sounds correct. The clients with only brokerB in their failover URI should have been updated with brokerA by the time that brokerB is shutdown. I recommend shutting down both brokers, deleting the data directory for each and running your test again. If you receive the same behavior, you should ask about this on the ActiveMQ user mailing list to see if there are any known issues around failover and rebalancing (although I see not discussion or JIRA issues for any such problem).

    ReplyDelete
  33. @Bruce Snyder, I see...thank you! I'll do some other tests.

    and, another strange thing is: it seems JAAS plugin and < systemUsage > could't work together. If I comment out any one of them, activemq could work normally. No error in log file. I don't understand why.

    ReplyDelete
  34. @dingzixuan, I just tested this by downloading ActiveMQ 5.5 and editing the conf/activemq.xml file to uncomment the <systemUsage> element and added the following to the config:

    <plugins>
    <jaasAuthenticationPlugin configuration="activemq-domain" />
    </plugins>

    I suggest that you double-check the ActiveMQ XML config file.

    ReplyDelete
  35. @Bruce Snyder, I found the reason, it's the order of systemUsage and plugins. I put plugins before systemUsage, thus activemq could work. I tried it many times, if systemUsage is in front of plugins, it just won't work(with no errors).
    So is the xml configuration restrict with the order of elements?

    ReplyDelete
  36. What you are experiencing is due to XML validation being enabled in ActiveMQ 5.4.x:

    Schema Validation - Alphabetically Ordered XML Elements (New in 5.4)

    This enforces the order of XML elements as required by the XSD.

    ReplyDelete
  37. Hi,

    Could someone explain me what I do wrong? I have 2 instances of brokers (activeMQ 5.5.1) on localhost and 2 consumers. My goal is to have exactly one consumer connected to instance1 of activeMQ and another consumer connected to instance2.

    amq instance1:
    <networkConnector uri="static:(tcp://localhost:62626)" conduitSubscriptions="false" />

    <transportConnector name="openwire" uri="tcp://localhost:61616" updateClusterClients="true" rebalanceClusterClients="true" updateClusterClientsOnRemove="true" />

    amq instance2:
    <networkConnector uri="static:(tcp://localhost:61616)" conduitSubscriptions="false" />

    <transportConnector name="openwire" uri="tcp://localhost:62626" updateClusterClients="true" rebalanceClusterClients="true" updateClusterClientsOnRemove="true" />

    Consumers use failover:

    Consumer1: failover:(tcp://localhost:61616,tcp://localhost:62626)?randomize=false&maxReconnectAttempts=-1

    Consumer2: failover:(tcp://localhost:62626,tcp://localhost:61616)?randomize=false&maxReconnectAttempts=-1

    When consumers starts they behave like I thought -> Consumer1 connects to broker instance1, COnsumer2 connects to broker instance2 (every broker has exactly one consumer)

    When I stop broker instance1 then Consumer1 connects to broker instance2. However when broker instance1 is started again both consumers stays connected to broker instance2.

    I thought they will rebalance, so finally every broker will have one consumer. Can't figure out what is wrong with my configuration. Could you please show me my mistake?

    Regards,
    Tomo

    ReplyDelete
  38. @tomo, It appears that someone else is encountering a similar problem to this and has filed a JIRA issue for it:

    https://issues.apache.org/jira/browse/AMQ-3544

    I suggest you ask about this on the ActiveMQ user mailing list. Info on that list is available here:

    http://activemq.apache.org/mailing-lists.html

    ReplyDelete
  39. @Bruce - thanks for your answer, I will try to find out solution at ActiveMQ mailing list.

    ReplyDelete
  40. Hi,

    I'm wondering if similar mechanism exists for shared filesystem master/slave setup not network of brokers?

    What we can do in order to let clients know the new master if the existing one goes away?

    Thanks

    ReplyDelete
    Replies
    1. @Fatih, To my knowledge, the rebalance feature is only available in a network of brokers. I suggest you ask your question on the ActiveMQ mailing lists.

      Delete
  41. Thanks for the answer. I think we should seriously think about using networked brokers for ha and also for load balancing purposes. I'll post a question to mailing list.

    Thanks again.

    ReplyDelete
  42. We had this working on 5.4 - clients would rebalance across the 2 brokers we have but after an upgrade to 5.8 it is no longer working. Configured brokers as per examples above and clients with only broker 1 in the Failover string.
    Is this supposed to still work in 5.8 or have there been some changes or something?

    Thanks

    ReplyDelete
    Replies
    1. That's a good question. Unfortunately I do not have the answer because I have not used 5.8 much. To get more eyes on your question, I suggest you ask this question on the ActiveMQ user mailing list. Information on subscribing can be found here:

      http://activemq.apache.org/mailing-lists.html

      Delete