Data-based resolution of uncertainty in science must deal with two largely orthogonal issues: doubt (the degree of belief that one has in a scientific proposition) and ambiguity (one’s understanding of the proposition). (This critical distinction is articulated most clearly in [1].) Throughout the 20th century, most of the focus in statistics was on doubt, with emphasis on artifacts like correlation coefficients and *p*-values. In the last two decades of the century, statisticians such as Pearl, Spirtes, Glymour, and Scheines began to develop formal methods for addressing ambiguity ([2,3] present later syntheses of their work, which began in the 1980s). Relying heavily on graphical models in which arrows from one state variable to another indicate causal influence, they showed how to analyze the causal structures that generate data, attacking the problem of ambiguity. The use of “elements” in the title is misleading, for this volume includes much recently published research and original results by the authors.

Chapter 1 gives a brief (13 page) introduction to probability and statistics and the relation between causal modeling and learning theory. Chapter 2 offers a powerful conceptual integration of three different characterizations of causality: the use of intervention (such as Pearl’s *do* operator), information independence between observed data and inferred conditional distributions, and the independence or otherwise of noise distributions associated with the variables whose causal relation is being explored. Chapter 3 is a brief introduction to the graphical representation of a causal hypothesis and how this model captures the role of interventions and the behavior of counterfactuals.

Readers familiar with previous work will recognize the importance of multiple interacting causal variables that support the definition and analysis of nontrivial conditional probabilities. As chapter 4 shows, under certain conditions involving only two variables, a joint distribution gives all the information necessary to resolve the causal direction between them, even with no intervention. These conditions include linear models with non-Gaussian additive noise, nonlinear additive noise models, discrete additive noise models, post-nonlinear models, the use of information geometry in causal inference, and certain constraints on the trace of the covariance matrix. It also speculates about the implications of algorithmic information theory for causal inference, and then develops methods for causal inference based on these conditions.

A recurring theme in the book is that machine learning is more powerful if augmented with causal modeling. Chapter 5 introduces this topic, based on the mechanisms in the previous chapter.

While solving causal relations over only two variables is a tour de force, most real problems are multivariate. Chapter 6, the longest chapter in the book, deals with such models, and chapter 7 talks about how to learn them from data. Then chapter 8 extends the discussion on applications of causal modeling to machine learning in the multivariate case.

Chapters 9 and 10 deal with two special cases: hidden variables and time series data.

Three appendices give mathematical preliminaries and proofs of key propositions. The book includes a bibliography with entries extending to 2017, and an overall index.

Starting with chapter 3, each chapter includes problems appropriate for use in a classroom situation, but the orientation of the book presumes advanced students who have either been exposed to the basics of causal modeling or can pick it up quickly from the brief summaries offered here. The book would also be a very useful supplement to a more conventional text on causality in a full-year course.

More reviews about this item: Amazon, Goodreads