My 2cents on this is there is small data and big data ML. Tools like Pandas (used in Python), are really used on "small data" because they require 5-10 times the RAM:
http://wesmckinney.com/blog/apache-arrow-pandas-internals/. So if you need to work with data sets of say, 5GB or more, there is a strong chance you will need to use another tool like Spark ML. In the big data ML space some popular tools are Cloud systems: EMR (which has Spark), AWS Sagemaker, Google Big Query (has ML built in now), etc.