I've done some research in data mining an web mining but never used R, though it is free software. I'd like to know how the R language fits in data mining tasks and, specifically, whether it is appropriate for web mining and building adaptive web applications and recommender systems (for e-commerce or e-learning) acting as a back-end (web apps built using j2ee for example).

R has very powerful support for data mining. In fact, after its graphics capabilities, this is what attracted me to the language. Rattle provides a graphical user inferface for data mining using R and there is a very easy to use interface to Weka routines called RWeka. There is a nice reference card on data mining with R available from RDataMining. You might also look at the CRAN task view on Machine Learning.

A great book on the subject is The Elements of Statistical Learning, by Hastie, Tibshirani and Friedman. A pdf version of the 5th edition is available online. R packages with code for the book are available here.

I think that you will find R appropriate for web mining and and web applications. In my own work, I use R as an exploratory data mining tool and do not build adaptive systems, but there are certainly many examples available.

The info you provided is very valuable. Specially the RWeka, because I used to work with Weka in my research projects (as you can see here for instance). By the way, the book The Elements of Statistical Learning is fantastic, lots of thanks!

I think that you will find R appropriate for web mining and and web applications. In my own work, I use R as an exploratory data mining tool and do not build adaptive systems, but there are certainly many examples available.

I'll search over the Internet to find some examples, but if you are aware of anyone please let me know.

Besides all this, and regarding the book "R in action" I suppose there are examples to illustrate the theory, but what kind of examples? Classical statistics or data mining too? Can you provide the Table of contents?

Thanks again.

Robert Kabacoff
author
Greenhorn

Joined: Mar 28, 2011
Posts: 25

posted

0

Hi Oriol,

You can get the table of contents and a PDF of the first chapter through the "R in Action" link below. I cover most of the classical and nonparametric statistical methods, along randomization and bootstrapping approaches for small or highly non-normal data, principal components and factor analysis, linear and logistic regression, power analysis, and advanced methods for dealing with missing data. There is a heavy emphasis on visualizing data and every technique is accompanied with examples of graphs that are useful for understanding results. My goal is to get you up and running in R quickly, help you avoid the painful learning curve people often experience, and give you a sense of what R can do.

I cover some methods used in data mining (e.g., linear and generalized linear regression techniques, assessing predictor importance, predicting categorical and count outcomes, and visualizing complex multivariate data). Others (e.g., cluster analysis, neural networks, classification and regression trees, support vector machines) are not covered, but should be easy to locate and learn after reading the book.