As the author hasn't replied, here's my personal take as an occasional user of Spark since around 2014.
Scala:
* AFAIK Spark is still written in Scala, which means new features appear in the Scala APIs first.
* This means the Scala API is usually a bit ahead of the others, and it will never be behind them.
* Personally, I find
Scala is a very natural language for this kind of processing (which is why Spark is based on it), so I am most comfortable with the Scala API. YMMV of course.
Python:
* Python is very widely used with Spark, as it is a much more popular language than Scala generally, and it is often used by data scientists.
* Python is also a popular choice for people who use interactive notebook interfaces, like Jupyter, with Spark (although you can also use Scala with notebooks these days).
* But the Python API is usually a little behind the Scala API, and some features are slower/harder to implement in Python than in Scala.
* So Python is a good choice for data scientists or if you are not concerned about having the very latest API features.
Java:
* There is no good reason to use Java with Spark.
* Although Java now offers Lambdas etc, it is still really clunky to write good functional code with Java compared to Scala.
* And Python is a much nicer language for data science and notebooks etc.
* If you're using Spark, pick a language API that works well with Spark and does the things that Spark does well.