File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Hadoop and the fly likes Apache Accumulo and Hadoop Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Apache Accumulo and Hadoop" Watch "Apache Accumulo and Hadoop" New topic
Author

Apache Accumulo and Hadoop

Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
What is the case with Apache Accumulo in the context of hadoop, and is there any real-case describing their usage together?

Regards,
Mohamed


Best Regards, Mohamed El-Refaey
www.egyptcloudforum.com
Garry Turkington
author
Greenhorn

Joined: Apr 23, 2013
Posts: 15
Hadoop using MapReduce is a batch processing framework. Typically you churn through a lot of data in queries that take seconds, minutes or longer.

Hbase and Accumulo offer something more like a database modelled on the Google BigTable paper. These can service low-latency end-user facing queries. Accumulo has a number of particular extensions over HBase, in particular around much finer grained security labelling and the ability to efficiently run server-side functions.
Garry
Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
So that means Accumulo can be used in real-time cases? Or it similar to in-memory database?
Brian Femiano
author
Greenhorn

Joined: Apr 24, 2013
Posts: 2
Accumulo can satisfy queries that demand fast response times, but the internal operations are not strictly in-memory. All underlying data structures are persisted to Hadoop HDFS.

It's primary purpose is to enable low-latency fetches over persistant columnar data stored in HDFS.
Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
I see. Thanks Brian.

Regards,
Mohamed
Garry Turkington
author
Greenhorn

Joined: Apr 23, 2013
Posts: 15
Low latency is really what it's all about. In particular latency that is low enough that you could potentially use it to directly back applications servicing direct end users.

But if your interest here and in the other question re realtime is not just for low latency but true 'hard' realtime systems with all their consequent requirements then that's likely not a good fit for Hadoop or any of the related projects. Indeed when you take into account the basic mechanics of a distributed system adding hard realtime requirements would put you into a very specialised niche that most Hadoop use cases don't have to worry about.

Garry
Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
Thanks Garry for clarifications ... It seems from all responses I got is that Hadoop may not the best options for hard real time processing, but at least it is capable of processing large base of data with an adequate speed.
Thanks again and have a nice day!

Regards,
Mohamed
 
wood burning stoves
 
subject: Apache Accumulo and Hadoop