• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Is map reduce not used for any new code because of a faster option Apache Spark?

 
Ranch Hand
Posts: 2951
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Apache Spark can be used for fast processing of huge volume of data. This is many times faster than Map Reduce. If it is new code then why would someone code in Map reduce instead of Spark code. I think for every use case Spark is preferred over Map Reduce. So does that mean map reduce should never be implemented for any project now unless it is some legacy code which has to be maintained.Is map reduce no more used for new implementations?

thanks
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:I think for every use case Spark is preferred over Map Reduce. So does that mean map reduce should never be implemented for any project now unless it is some legacy code which has to be maintained.Is map reduce no more used for new implementations?


Well, this article would suggest that it is a later development, but whether it's fully backward-compatible is another question. Since it wasn't developed by Apache themselves, I'd suspect not unless it's explicitly stated in the Apache docs.

But not knowing anything much about either, I couldn't say much more.

Winston
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Apache Spark is a completely separate system from Hadoop, although it can run on a Hadoop cluster via the YARN manager. It can also run on its own cluster or share cluster resources intelligently with an Apache Cassandra database. Spark is an in-memory distributed processing engjne that can read/write data on many different storage platforms. Hadoop has its own distributed storage (HDFS) which also provides the basis for Hadoop's Hive SQL layer and its HBase column-family datastore. So Hadoop provides distributed storage and processing, but the processing is based on MapReduce. Hadoop programs have usually been written in Java or Pig, which gets converted to MapReduce tasks.

Spark is written in Scala and provides excellent APIs in Scala and Python. There is also a Java API but it is kind of clunky as Java is not so good at supporting the functional programming techniques that Spark is built on. If you want to adopt Spark instead of Hadoop MapReduce, then you'll probably want/need to learn Scala in order to make best use of it. All the major commercial Hadoop suppliers are backing Spark and integrating it into their bundled Hadoop platforms, and it has gained a lot of interest over the last couple of years. I've been prototyping with Spark over the last year and I cannot imagine why anybody would choose MapReduce for a new project when Spark is so much more expressive, powerful and flexible. I also feel the same way about Scala - I never want to go back to Java!

On the other hand, Spark is much less mature than Hadoop, and there is a real shortage of tools, skills and documentation for Spark, especially around admin and management. If you're using one of the integrated Hadoop packages like Cloudera, then you might be happy to turn to Cloudera for support with these issues. But if you are running your own Hadoop cluster without support, you might be concerned about the extra challenges and potential risks involved in using Spark as well. It will depend on your resources - skills, external support, finance, extra hardware/VM capacity etc.

Of course there are a lot of Java MapReduce applications out there (Hadoop MapReduce has been around for 10 years), so there will still be a need for people who can work with these, but I think you're right that these will increasingly be seen as legacy code, while new Big Data applications will probably be more likely to use Spark, on top of or alongside Hadoop, or in combination with other distributed data storage.

Finally, it's worth remembering that there are other alternatives to MapReduce or Spark, e.g. using streaming and in-memory processing. I haven't used these myself but you might want to Google some of these.

(EDIT: 17/01/2016)
  • Cascading is a mature and well-documented Java library that provides a more flexible high-level abstraction layer on top of MapReduce, so you can model your data processing as a pipeline (similarly to Spark), but I think it still gets turned into MapReduce tasks underneath.
  • Apache Flink is a "a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams", so I guess it provides some of the same features as Spark, but I don't know anything else about this.
  •  
    chris webster
    Bartender
    Posts: 2407
    36
    Scala Python Oracle Postgres Database Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Winston Gutkowski wrote:...But not knowing anything much about either, I couldn't say much more.

    Winston


    In case you're curious, Scala guru Dean Wampler gives a good overview of Why Spark Is the Next Top (Compute) Model for Big Data.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2951
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    If you want to adopt Spark instead of Hadoop MapReduce, then you'll probably want/need to learn Scala in order to make best use of it



    Thanks. Why should one prefer Scala if one can do it in familiar Java programming as Spark coding can be done in Java,Scala and Python.Why not Java?
     
    chris webster
    Bartender
    Posts: 2407
    36
    Scala Python Oracle Postgres Database Linux
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Monica Shiralkar wrote:

    If you want to adopt Spark instead of Hadoop MapReduce, then you'll probably want/need to learn Scala in order to make best use of it



    Thanks. Why should one prefer Scala if one can do it in familiar Java programming as Spark coding can be done in Java,Scala and Python.Why not Java?


    Because Scala makes it much easier to write Spark code, and you can still interoperate with Java if you have to. Compare the Scala and Java code on the Spark samples to get a feel for the difference.

    This guy from Cloudera explains why he chose to use Scala rather yhan Java or Python for working with Spark.
     
    Sheriff
    Posts: 28328
    96
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Good question, brought out some good answers. Have a cow for the good question!
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2951
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    thanks
     
    Ranch Hand
    Posts: 45
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Agree, Spark is the new technology, the map-reduce days of craze are gone for now !
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2951
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Is Map reduce the only member of hadoop ecosystem which has got affected due to this and thus being used less for new project and are other members like hive ,hbase ,hdfs etc still being used a lot like earlier ?
     
    The world's cheapest jedi mind trick: "Aw c'mon, why not read this tiny ad?"
    Gift giving made easy with the permaculture playing cards
    https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
    reply
      Bookmark Topic Watch Topic
    • New Topic