• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Storm compared to Hadoop and Spark

 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Authors,
As far as I understand, there are at least 3 cluster computing frameworks Apache has released - Hadoop, Spark and Storm.
Could you please help understand which use cases would better fit in Storm comparing to Hadoop and Spark ?
There is another one, "Giraph", but per my understanding it is best for Graph processing ( never used it though).

Thanks,
Amit
 
Author
Posts: 14
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hadoop is oriented towards working with batches of data.

Spark is oriented towards working with either batches of data like Hadoop or towards "micro batching" which is basically smaller batches of data that starts to approximate what a streaming solution is like.

Storm is oriented towards working on a never ending stream of data where you are constantly calculating and there is no start or end. Whenever data arrives, it is processed. Storm via Trident can also do microbatching.

Think batch processing system when you are crunching a large amount of data and don't need an answer right now. For example, you can process your website's log files to look for trends every day and extract value from them, then a batch framework like Hadoop is perfect. However, if you are analyzing those logs in order to detect intrusion attempts against your system, then you want to know as soon as possible. For this, you would want a system like Storm where each event within your system is shipped as a stream to Storm as soon as it happens so you can analyze it immediately.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic