File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other Open Source Projects and the fly likes Specific problem domains in which which mahout is best  Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


Win a copy of Mongo DB Applied Patterns this week in the MongoDB forum
or a resume review from Five Year Itch in the Jobs Discussion forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Reply Bookmark "Specific problem domains in which which mahout is best  " Watch "Specific problem domains in which which mahout is best  " New topic
Author

Specific problem domains in which which mahout is best

Alok Bhandari
Greenhorn

Joined: Jun 11, 2008
Posts: 15

Welcome to the Javaranch.

Question which I have is that for clustering we use mahout but are there any specific scenarios in which mahout gives the better performance?I mean to say the specific problems types for which it gives a better result over others?

Thanks


SCJP 1.6(98%)
Sean Owen
author
Greenhorn

Joined: Nov 08, 2004
Posts: 21
Gives better results than what? And "better" in the sense of faster, or "more accurate"?

The clustering algorithms in Mahout are fairly standard algorithms, not some special approach. So I think they perform as well as any other implementation of these standard algorithms in terms of quality.

In terms of performance -- they are implemented on Hadoop. This means it is much easier to scale up to very large data sets, but means you incur a lot of Hadoop overhead. For small data sets, you could probably find a faster implementation that is all on one machine, maybe something written in R. For very large data sets, where you can't apply non-distributed tools, I imagine it's about as good as anything else freely available out there. Honestly I'm not aware of another distributed clustering package to compare to.
Alok Bhandari
Greenhorn

Joined: Jun 11, 2008
Posts: 15
Hello

Thanks for your reply. Yes I was asking in terms of the accuracy and performance both.

Thanks
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: Specific problem domains in which which mahout is best
 
Similar Threads
Integration between Mahout and Lucene/Solr
What are some alternatives to Mahout?
Mahout architecture
Mahout in Action - evolution of the library and the book
Apache mahout