Computing Reviews, the leading online review service for computing literature.

Search

Data science: an introduction to statistics and machine learning
Plaue M., Springer International Publishing, Cham, Switzerland, 2023. 385 pp. Type: Book (3662678810)

Date Reviewed: Mar 1 2024

This comprehensive textbook covers a broad spectrum of essential topics for understanding and working in the field of data science. It is structured in a clear, well-organized manner, divided into three main parts. Part 1, “Basics,” starts with foundational concepts like data organization, quality, and cleaning. It introduces various data models, including relational, graph-based, and hierarchical models. The focus on data quality, including aspects like validation, standardization, and deduplication, provides thorough groundwork for understanding the importance of data integrity in data science. Part 2, “Stochastics,” delves into probability theory, exploring concepts like probability measures, random variables, and characteristic measures. It explains key principles like Bayes’ theorem, conditional probability, and various distributions. The section on inferential statistics is particularly valuable, covering topics like statistical models, interval estimation, hypothesis testing, and regression analysis. It’s a robust guide to the statistical underpinnings of data science. Part 3 addresses the core of modern data science: machine learning. It’s divided into sections on supervised and unsupervised learning, detailing algorithms, neural networks, dimensionality reduction, and cluster analysis. The practical applications section, with examples like text recognition and sentiment analysis, provides real-world context to the methodologies discussed. The book covers a wide range of topics, from basic statistical concepts to advanced machine learning algorithms. It is both deep and broad, making it a valuable resource for both beginners and experienced practitioners. Each concept is well explained and often accompanied by practical examples, which enhances understanding. The inclusion of real-world examples and applications of machine learning techniques is a major strength. However, the mathematical rigor might be challenging for beginners and those without a strong background in mathematics and statistics. This book covers foundational concepts that are timeless, but it’s also important for readers to be aware that they might need to supplement their knowledge with more current trends and software tools in the field. An important aspect of data science is the practical implementation of algorithms and models using programming languages like Python or R. The book’s usefulness would have been enhanced by code examples and discussions of software commonly used in the field. Data visualization and the ability to communicate findings are key skills in data science. The book lacks sections on these aspects, which would significantly add to its practical utility. The inclusion of exercises and challenges that encourage critical thinking and problem solving is a strong point. This not only helps in understanding the concepts but also in applying them to solve real-world problems. Readers might need additional resources for topics of particular interest or complexity, and may need supplementary materials for the latest trends, tools, and deeper dives into specific topics [1,2,3,4,5].

Reviewer: Wael Badawy	Review #: CR147719

1)	James, J.; Witten, D.; Hastie, T.; Tibshirani, R. An introduction to statistical learning: with applications in R. Springer, New York, NY, 2013.

2)	Clarke, B.; Fokoue, E.; Zhang, H. H. Principles and theory for data mining and machine learning. Springer, New York, NY, 2009.

3)	Cady, F. The data science handbook. Wiley, Hoboken, NJ, 2017.

4)	Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: data mining, inference, and prediction (2nd ed.). Springer, New York, NY, 2009.

5)	Veltri, G. A. Big data is not only about data: the two cultures of modelling. Big Data & Society 4, 1 (2017), https://doi.org/10.1177/2053951717703997.

Statistics (K.1 ... )

Statistical (I.5.1 ... )

Statistical (I.4.10 ... )

Statistical Computing (G.3 ... )

Statistical Databases (H.2.8 ... )

Statistical Methods (D.2.4 ... )

Would you recommend this review?

yes

Other reviews under "Statistics":	Date

Employment and salaries of recent doctorates in computer science Maisel H., Gaddy C. Communications of the ACM 40(9): 90-93, 1997. Type: Article	Dec 1 1997

International dimensions of the productivity paradox Dewan S., Kraemer K. (ed) Communications of the ACM 41(9): 56-62, 1998. Type: Article	Nov 1 1998

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy