File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Collective Intelligence - Real Time Analysis Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Collective Intelligence - Real Time Analysis" Watch "Collective Intelligence - Real Time Analysis" New topic
Author

Collective Intelligence - Real Time Analysis

Jeff Storey
Ranch Hand

Joined: Apr 07, 2007
Posts: 230
Hi Satnam,

I have previously worked developing data mining applications, and one of the biggest problems we ran into is the ability to extract trends and analyze information in real-time. The data we were working with was rapidly changing (every couple of hours), so caching the data for any longer than that was not really an option. With information available in real-time from everywhere using Google, people want quick results.

Does your book discuss techniques for real time analysis? Also, do you use existing data mining frameworks (I believe I saw a weka jar in the source code, but I'm not sure their package is free anymore as it is now part of the Pentaho project)? Another issue we've had is that some of these frameworks, such as GATE and weka, are very heavyweight, and they can involve a lot of memory overhead to use even a small subset of their features.

Thanks, looking forward to hearing back.

Jeff Storey
Satnam Alag
Author
Greenhorn

Joined: May 07, 2008
Posts: 26
Jeff,

Great questions. Let me try and answer each one of them

Real-time analysis:
One of the first things I do in the book -- Section 2.1 -- is to present the architecture for applying collective intelligence in real-world applications. The key to applying these techniques is to precompute as much as possible asynchronously, so that minimal computation is carried out while the user is waiting. It helps to also have an event-driven SOA architecture.

One of the case studies I cover (Section 12.4.2) is how these techniques are being applied by Google News for personalization. They have a similar problem of high item churn and a large number of users. To quote a section from the book


Google News is a good example of building a scalable recommendation system for large number of users (several million unique visitors in a month) and large number of items (several million new stories in a two month period) with constant item churn � this is different from Amazon where the rate of item churn is much smaller.


Typically, the book presents the concepts (showing how the math works) by taking a simple example and working through the math, then a version of the algorithm is implemented in Java, and then I show how to use open-source APIs like WEKA, Lucene, Nutch, and JDM to solve the same problem. If you follow the principle of precomputing the information asynchronously, you should be able to solve the problem of some of the APIs being very heavyweight.

thanks
Satnam
Jeff Storey
Ranch Hand

Joined: Apr 07, 2007
Posts: 230
Satnam,

Thanks for the reply. I'm looking forward to reading the book.

Jeff
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 15665
    
  15

Pentaho is all open-source if I'm not mistaken. A lot of it was created by combining other open-source projects.


Customer surveys are for companies who didn't pay proper attention to begin with.
Jeff Storey
Ranch Hand

Joined: Apr 07, 2007
Posts: 230
Tim,

You are correct, Pentaho projects are open source. The weka project is licensed under GPL, which makes it difficult to integrate into commercial applications (unless you want to release your source), but they do offer some commercial licensing (which I believe is rather expensive).

Jeff
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Collective Intelligence - Real Time Analysis
 
Similar Threads
Collective Intelligence in Action and Algorithms of the Intelligent Web
* Welcome Satnam Alag
Collective Intelligence in Action
Java-based Collective Intelligence
OOP considered harmful (was: C. Date on UML book)