The first version of my benchmark simply consisted in a single client that started as many thread as there was node in the cluster. It is kind of trivial to see that this approach cannot scale to a large number of nodes, the client would be overloaded very fast.

There was two way to use the benchmark, with systems that does not include a load balancer like Cassandra or Riak on the left and for systems with a load balancer like HBase on the right :

The current architecture

The solution to this problem is quite simple, I must multiply the number of clients. Of course I would need some kind of controller to send the commands to all the clients and then get back the results from all the clients. I have implemented this idea using sockets to implements simple client/server communications between the clients and the benchmark controller. With this new architecture, the two previous ways of using the benchmark become :

The new scalable architecture

This new architecture coupled to a way bigger data set (all the 10 millions article from Wikipedia in English) should be enough to bring my benchmark to a more realistic level. I plan to to test it on the Amazon EC2 infrastructure, on hundreds of machines.

The code of this new architecture is already available on github in the “scalable” branch.