The version concept comes from Google's BigTable paper, which was the basis for implementing HBase.
Google's search spider keeps visiting websites multiple times. Since websites may change between each visit, BigTable stores multiple
versions of the contents and perhaps relationships between sites. So it's easy to make a query like "get latest contents of <url>" or "get latest 2 versions of <url> and diff them".
It's like version control for data.
If a cell value can change but you need the history of changes later on - perhaps for auditing or diff'ing - use versions.
Whether it's useful to your application depends on what your application does.
For example, an editable wiki can store multiple versions of a wiki article in the same row and column. If you were using an RDBMS, it would require
multiple rows with different entries in the timestamp column.
You are right Hbase does have version support in column family. Although I am not sure about the number of version it support.
According to me, having version support is one of the key benefits of Hbase.
In RDBMS, you can maintain a backup of the database for case like failure or roll back. It will consume lot of space and you have to load the whole backup inorder to check the single change in column value.
With HBase, you can simply do it by writing a single code:
For example: -
- to return more than one version, see Get.setMaxVersions()
You can also check the values at given time:
- to return versions other than the latest, see Get.setTimeRange()
Hbase is typically used in Analytics now days. If you are able to check the value change in the same field which is very important aspect of analytic you can easily do it with Hbase.
If you look for google, you will find the multiple scenarios of the version support.
My previous laptop never exploded like that. Read this tiny ad while I sweep up the shards.
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop