Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Introduction to data mining, (first edition)
Tan P., Steinbach M., Kumar V. (ed), Addison-Wesley Longman Publishing Co, Inc., Boston, MA, 2005. Type: Book (9780321321367)
Date Reviewed: Feb 6 2006

Data mining is the science of extracting meaningful knowledge and structures from large volumes of data. This process is an essential component in modern marketing, business decision support, medical research, and computer security related activities. I was pleasantly surprised to discover a book that is both scientifically comprehensive and easy to follow by a reader interested in finding solutions to concrete problems. While some data mining books are oriented heavily toward mathematical background, and strive to provide proofs about the correctness of the described methods, others cover only some tool-oriented methods, without describing the underlying theory. This book succeeds in striking the right balance. The key concepts and data mining methods are introduced and described thoroughly, with respect to both their applicability for some real-world applications and their mathematical background and nature.

The authors start with an introduction to the objectives of data mining tasks, data collection, and analysis procedures (data processing and sampling, variable types, and so on), giving a broad overview of this discipline and its associated context. This part also includes a highly relevant hypothetical dialog between a data miner and a statistician, illustrating the key differences between these two related disciplines. The remainder of the book is divided into four conceptual parts. Data classification is addressed in two chapters. The first covers decision trees and their performance evaluation, while the second addresses some state-of-the-art classification methods based on neural networks, support vector machines, and Bayesian networks. The second part addresses the analysis of associations, which can be summarized as inferring relationships among multiple observed variables. This section starts with basic association detection, and ends with advanced issues like dea!ling with infrequent patterns, subgraph patterns, and timing constraints. One essential data mining task is the automatic clustering of data, so that an underlying structure can be highlighted. These clusters are composed of items that are close in terms of their characteristics. This issue is covered in two dedicated chapters, and this third part of the book is an excellent overview of existing data clustering methods: k-means, neural network based, minimum spanning tree based, and several others, just to name a few. Finally, anomaly detection is covered in the fourth part of the book. Anomaly detection is currently the major building block of modern intrusion detection and process control systems. As such, this chapter might be of major interest to readers working on such projects.

The mathematical background required to follow the text is limited to basic probabilities and linear algebra. For those readers wishing to refresh their background in optimization theory, linear algebra, statistics, and probability theory, several well-written appendices make for very enjoyable reading. I started reading this book searching for a solution to some network security related problems requiring a data mining process. Besides finding the conceptual solutions, I discovered a superb and thrilling book on data mining and structural knowledge extraction. I strongly recommend this book to all readers interested in data mining techniques and data analysis.

Reviewer:  Radu State Review #: CR132399 (0612-1220)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Database Applications (H.2.8 )
 
 
Data Mining (H.2.8 ... )
 
 
Search Process (H.3.3 ... )
 
 
Database Administration (H.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy