Absolutely none. The fact that Spark is able to run over Hadoop was done merely for backwards compatibility as Hadoop was the 'big thing' in distributed processing at the time. In 'standalone mode', Spark can actually run over any distributed file system... it doesn't have to be Hadoop.
Theo van Kraay wrote:Spark can actually run over any distributed file system
For what reason does it always require a file system. I can understand it is required for reading data from file system but in case of streaming, why would the file system surely be required?
Whoever got anywhere by being normal? Just ask this exceptional tiny ad:
Gift giving made easy with the permaculture playing cards