Win a copy of 97 Things Every Java Programmer Should Know this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • salvin francis
  • fred rosenberger

Storm compared to Hadoop and Spark

 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Authors,
As far as I understand, there are at least 3 cluster computing frameworks Apache has released - Hadoop, Spark and Storm.
Could you please help understand which use cases would better fit in Storm comparing to Hadoop and Spark ?
There is another one, "Giraph", but per my understanding it is best for Graph processing ( never used it though).

Thanks,
Amit
 
Author
Posts: 14
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hadoop is oriented towards working with batches of data.

Spark is oriented towards working with either batches of data like Hadoop or towards "micro batching" which is basically smaller batches of data that starts to approximate what a streaming solution is like.

Storm is oriented towards working on a never ending stream of data where you are constantly calculating and there is no start or end. Whenever data arrives, it is processed. Storm via Trident can also do microbatching.

Think batch processing system when you are crunching a large amount of data and don't need an answer right now. For example, you can process your website's log files to look for trends every day and extract value from them, then a batch framework like Hadoop is perfect. However, if you are analyzing those logs in order to detect intrusion attempts against your system, then you want to know as soon as possible. For this, you would want a system like Storm where each event within your system is shipped as a stream to Storm as soon as it happens so you can analyze it immediately.
 
Do the next thing next. That’s a pretty good rule. Read the tiny ad, that’s a pretty good rule, too.
Devious Experiments for a Truly Passive Greenhouse!
https://www.kickstarter.com/projects/paulwheaton/greenhouse-1
    Bookmark Topic Watch Topic
  • New Topic