The first results I published were not in favor of HBase, the performances for both read/update and MapReduce were decreasing with the size of the cluster and were very instable. I spent a lot of time trying to figure out what was the problem and I think I finally have found what could be the causes of those poor performances with Hbase 0.20.6.

My update function only concatenates the string “1″ at the end of the article, so I did not expect to see my data set growing a lot. But with HBase 0.20.6 the number of regions as well as the space used on HDFS kept growing quite fast during all my tests. In fact Hbase was constantly splitting up regions and kept using more and more space on HDFS without any good reason. This almost constant splitting of regions killed the performances for good.

Eventually I decided to apply the same tests (this is exactly the same code running on the client side, and I started with the same configuration) but using the more recent 0.90.0 version. This time the behavior of HBase was very good, the number of regions stayed constant during all my tests and the performances were kinda nice.

For the sake of readability, the results of HBase 0.20.6 are not on this graph :

Read/Update performances with HBase 0.90

As you can see, Hbase 0.90.0 is very fast. On average, the time needed to complete all the operations is almost the same for HBase 0.90 on 3 servers (13.66 seconds) than on Cassandra 0.6.10 with 8 servers (13.55 seconds).  It is also worth noticing that the standard deviation is almost always very small if you consider the three measures for each cluster size. The higher values of the standard deviations are due to the fact that sometimes HBase takes a little bit more time to distribute the regions across the  region servers. If HBase starts to distribute the regions during the test, the performances are less stable. Finally, on the scalability side, HBase 0.90.0 has a global increase in performances of 158% going from 3 to 8 servers. That is very close to the average scalability that I have measured for the other databases.

On the MapReduce performances side, the results are less good but still much better than with HBase 0.20.6.

MapReduce performances

Those results for HBase 0.90.0 are quite good but it is very weird to see that the performances do not increase with the size of the cluster. For the moment I have no idea why the MapReduce computation does not seems to take advantage of the bigger computational power.

I will soon make another post to describe in details some of the weird things that I have observed with HBase 0.90.0 regarding to MapReduce. This post will also contain the slightly modified configuration that I used for those test.