Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Apache Accumulo and Hadoop

 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is the case with Apache Accumulo in the context of hadoop, and is there any real-case describing their usage together?

Regards,
Mohamed
 
Garry Turkington
author
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hadoop using MapReduce is a batch processing framework. Typically you churn through a lot of data in queries that take seconds, minutes or longer.

Hbase and Accumulo offer something more like a database modelled on the Google BigTable paper. These can service low-latency end-user facing queries. Accumulo has a number of particular extensions over HBase, in particular around much finer grained security labelling and the ability to efficiently run server-side functions.
Garry
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So that means Accumulo can be used in real-time cases? Or it similar to in-memory database?
 
Brian Femiano
author
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Accumulo can satisfy queries that demand fast response times, but the internal operations are not strictly in-memory. All underlying data structures are persisted to Hadoop HDFS.

It's primary purpose is to enable low-latency fetches over persistant columnar data stored in HDFS.
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see. Thanks Brian.

Regards,
Mohamed
 
Garry Turkington
author
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Low latency is really what it's all about. In particular latency that is low enough that you could potentially use it to directly back applications servicing direct end users.

But if your interest here and in the other question re realtime is not just for low latency but true 'hard' realtime systems with all their consequent requirements then that's likely not a good fit for Hadoop or any of the related projects. Indeed when you take into account the basic mechanics of a distributed system adding hard realtime requirements would put you into a very specialised niche that most Hadoop use cases don't have to worry about.

Garry
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Garry for clarifications ... It seems from all responses I got is that Hadoop may not the best options for hard real time processing, but at least it is capable of processing large base of data with an adequate speed.
Thanks again and have a nice day!

Regards,
Mohamed
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic