Win a copy of TDD for a Shopping Website LiveProject this week in the Testing forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

Use of frameworks like Apache Spark, Kafka in AI

 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the use of frameworks like Apache Spark and Kafka in AI? Can machine learning be learnt without the knowledge of Apache Spark and Kafka?
What are the advantages that anyone shall be having if he/she knows these frameworks in addition to libraries like NumPy, Pandas etc.?
 
Greenhorn
Posts: 22
5
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My 2cents on this is there is small data and big data ML.  Tools like Pandas (used in Python), are really used on "small data" because they require 5-10 times the RAM:  http://wesmckinney.com/blog/apache-arrow-pandas-internals/.  So if you need to work with data sets of say, 5GB or more, there is a strong chance you will need to use another tool like Spark ML.  In the big data ML space some popular tools are Cloud systems:  EMR (which has Spark), AWS Sagemaker, Google Big Query (has ML built in now), etc.  
 
M Mohile
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tools like Pandas (used in Python), are really used on "small data" because they require 5-10 times the RAM:  http://wesmckinney.com/blog/apache-arrow-pandas-internals/.  



Modifying the URL link : Apache Arrow and the "10 Things I Hate About pandas"  since the URL  above does not work when period is included in the URL after last forward slash.

Anyways, thanks Noah for your reply.
 
For my next trick, I'll need the help of a tiny ad ...
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic