• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Mastering Data Mining by Michael J. A. Berry, Gordon S. Linoff

 
Bartender
Posts: 962
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
<pre>Author/s : Michael J. A. Berry, Gordon S. Linoff
Publisher : Wiley
Category : Other
Review by : Margarita Isayeva
Rating : 9 horseshoes
</pre>
I was looking for an introductive overview on "what is data mining", and I am more than satisfied.
This book is about data mining in practice rather than in theory. It describes the whole process, from deciding which data columns aren't very useful, to testing and tuning the model -- the very details they forgot to tell us in my statistics class. There are rules of thumb ("set a minimum node size for a decision tree around 50 or 100"), and estimations ("in general, you need at least several thousand records in the model set", "the ratio of the rarer outcome should comprise 15-30%"). Three main techniques: cluster-detection, decision trees, and neural networks are described, and the principles of their working explained in plain language. Details are provided concerning when to use each technique (neural networks cannot explain result while decision trees can), and what types of data each technique works best with (decision trees works with categorical variables (e.g. list of states), neural networks require numerical input and cannot deal with missing values).
Almost half of the book is devoted to case studies. It can be boring reading, unless you are a data obsessed person, and if you are not, you probably shouldn't go into data mining. I was surprised myself that I did not skip this part, instead reading it with increasing interest.
My only complaint about the content: there is no chapter about what software is available to perform data mining.
Almost no formulas are presented, except for a few simple diversity metrics given in a couple of sidebars. There are however plenty of graphics, diagrams, and screen-shots. The text is very dense, so I was a little overwhelmed after my first reading. A second pass was needed to improve my understanding.
The book is so practically oriented, that it's almost "learning by example". To get the most of it, read it after you read a more traditional, systematic tutorial -- it will be an indispensable supplement.


More info at Amazon.com
More info at Amazon.co.uk
[ May 29, 2003: Message edited by: Book Review Team ]
 
reply
    Bookmark Topic Watch Topic
  • New Topic