Computing Reviews

Creating good data: a guide to dataset structure and data representation
Foxwell H., Apress, New York, NY, 2020. 124 pp. Type: Book
Date Reviewed: 09/02/21

The dream of all writers: to be the right person, at the right place, at the right time. Harry Foxwell hit the jackpot:

(1) An excellent author with a gift of insight.
(2) The right place: the world.
(3) The right time: the pandemic.

Every scientist, every research institute, and for that matter, every vaccination itself, longs for clean data upon which to build an effective antidote to one of the planet’s great scourges. And all know that that vaccine’s legitimacy and effectiveness can only be studied and researched thru the use of clean data. Harry Foxwell has given us a user’s manual at exactly the needed time and he has done it using clear, understandable language.

The first chapter, at initial reading, seems to be the obligatory homily that appears in every book of this genre; however, although it is that, it also lays down specific definitions of the elusive “good data” and needs to be read. Chapter 2 details types of data and when to use the same. The content of this chapter is almost identical to any initial statistics text covering all data types, but again is worth reading.

Chapters 3 and 4 discuss qualitative data and how to determine appropriate data types when approaching your study. Chapter 4 also starts the process of good design and how to achieve the same. Chapter 5 defines what is meant by good data and also covers metadata. Chapter 6 continues the design process by discussing data collection and some of the pitfalls in that process. The value of the chapter lies in the reader not minimizing the collection process and how it can lead to a disastrous study if not kept constantly in mind.

Chapter 7 puts together the points covered in the first six chapters via the use of case studies--both successful and not so successful. Chapter 8 details the process of “cleaning” data and looks at cleaning software and methods, including the use of software products such as R project and Python. The text concludes with chapter 9 on what constitutes good data analytics, emphasizing what is covered in the book.

The book is definitely valuable and anyone involved in statistical studies would do well to read it. The text could also be a useful tool in a graduate analysis course. The writing is clear and to the point, with no unnecessary “preaching.”

Reviewer:  James Van Speybroeck Review #: CR147346

Reproduction in whole or in part without permission is prohibited.   Copyright 2021™
Terms of Use
| Privacy Policy