• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

Storm compared to Hadoop and Spark

 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Authors,
As far as I understand, there are at least 3 cluster computing frameworks Apache has released - Hadoop, Spark and Storm.
Could you please help understand which use cases would better fit in Storm comparing to Hadoop and Spark ?
There is another one, "Giraph", but per my understanding it is best for Graph processing ( never used it though).

Thanks,
Amit
 
Author
Posts: 14
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hadoop is oriented towards working with batches of data.

Spark is oriented towards working with either batches of data like Hadoop or towards "micro batching" which is basically smaller batches of data that starts to approximate what a streaming solution is like.

Storm is oriented towards working on a never ending stream of data where you are constantly calculating and there is no start or end. Whenever data arrives, it is processed. Storm via Trident can also do microbatching.

Think batch processing system when you are crunching a large amount of data and don't need an answer right now. For example, you can process your website's log files to look for trends every day and extract value from them, then a batch framework like Hadoop is perfect. However, if you are analyzing those logs in order to detect intrusion attempts against your system, then you want to know as soon as possible. For this, you would want a system like Storm where each event within your system is shipped as a stream to Storm as soon as it happens so you can analyze it immediately.
 
Brace yourself while corporate america tries to sell us its things. Some day they will chill and use tiny ads.
Enterprise-grade Excel API for Java
https://products.aspose.com/cells/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!