Win a copy of Secure Financial Transactions with Ansible, Terraform, and OpenSCAP this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

Spark in Action: Pros and Cons of each language for Spark

 
Master Rancher
Posts: 572
9
Android Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Jean-Georges Perrin,

As the book said, it cover three programming languages, which are Java, Scala and Python.

From your experience,  what should be considered when choosing a language for Spark ? Does they have pros and cons of each ?

Thanks.
 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As the author hasn't replied, here's my personal take as an occasional user of Spark since around 2014.

Scala:

* AFAIK Spark is still written in Scala, which means new features appear in the Scala APIs first.
* This means the Scala API is usually a bit ahead of the others, and it will never be behind them.
* Personally, I find Scala is a very natural language for this kind of processing (which is why Spark is based on it), so I am most comfortable with the Scala API.  YMMV of course.

Python:

* Python is very widely used with Spark, as it is a much more popular language than Scala generally, and it is often used by data scientists.
* Python is also a popular choice for people who use interactive notebook interfaces, like Jupyter, with Spark (although you can also use Scala with notebooks these days).
* But the Python API is usually a little behind the Scala API, and some features are slower/harder to implement in Python than in Scala.
* So Python is a good choice for data scientists or if you are not concerned about having the very latest API features.

Java:

* There is no good reason to use Java with Spark.  
* Although Java now offers Lambdas etc, it is still really clunky to write good functional code with Java compared to Scala.
* And Python is a much nicer language for data science and notebooks etc.
* If you're using Spark, pick a language API that works well with Spark and does the things that Spark does well.




 
Ranch Hand
Posts: 2424
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Christopher Webster wrote:
There is no good reason to use Java with Spark.



While Python is the preferred choice while going for Spark ML , for other cases I think, suppose the team has developers who are good in Java(instead of Scala), if we go for Java, we still can still have the option of moving to Scala later when we have that skill set in team. However if today we go for Python, then it is like another route altogether as then relatively it would be less likely to be able to move to Scala. The reason for this is that Scala and Java, the JVM languages have more in common than Scala and Python.
 
We can walk to school together. And we can both read this tiny ad:
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic