Top 15 Books to Master Data Strategy
In this article, we outline 15 books on topics ranging from the technical to the non-technical, to help you improve your understanding of end-to-end best practices related to data.
If you’re a data practitioner with your eye on a leadership role, learning Data Management will be an important step toward getting you where you want to go. In this article, we outline 15 books on topics ranging from Data Architecture (highly technical) to Data Literacy (broadly nontechnical) to help you improve your understanding of end-to-end best practices related to data.
Photo by Gaelle Marcel on Unsplash
- Authors: various leaders within DAMA International
- Time to read: 20 hrs 5 mins (588 pages)
- Rating: 4.8 / 5 (397 total ratings)
Summary: I’d be remiss if I didn’t begin this list here. This behemoth covers 14 practical topics related to Data Strategy, followed by 3 topics related to implementation.
The 14 different knowledge areas are best represented by the Aiken Pyramid, which outlines how these topics build upon each other. Data Governance forms the basis of the pyramid, with the next layer composed of Data Architecture, Data Quality, and Metadata Management, and so on, until we reach the top with Data Science (which the book naively refers to as “Big Data”).
The DMBOK can be a bit frustrating to read, largely because it was written by 20+ members of DAMA International, seemingly without a connective thread. This trusty book is currently in it’s second edition — for v3, I’d recommend an editor. Video review here.
TL;DR: If you read this one (or at least skim most of it and highlight the important parts), you’ll be prepared to take the open book, open notes Certified Data Management Professional (CDMP) exam (aka the best Data Science certification you’ve never heard of).
- Author: Laura Sebastian-Coleman
- Time to read: 6 hrs 5 mins (208 pages)
- Rating: 4.4 / 5 (76 total ratings)
Summary: This book is for those who are working on a Data Governance implementation in their organization and struggling to overcome business barriers. If you are in leadership or need to coordinate with executives on Data Management, Navigating the Labyrinth offers a guide to this complex field.
Sebastian-Coleman translates Data Management ideas, frameworks, and procedures into a business-friendly book that bridges the gap between technical subject matter experts and executive decision makers. The guide is widely recognized as a fantastic overview of the overall goals of Data Management, the terminology, and how to implement Data Strategy at a high level.
TL;DR: Navigating the Labyrinth cuts through the complexity with tried and true principles that tie in closely with the Data Management Body of Knowledge to improve your overall understanding of Data Management.
- Author: John Ladley
- Time to read: 8 hrs 45 mins (264 pages)
- Rating: 4.3 / 5 (80 total ratings)
Summary: Why is Data Governance required to maintain an organization’s success? How should a team plan, begin, and execute a Data Governance initiative? How can a program be kept alive? With frameworks and case studies, this book illustrates how to create a successful and cost-effective Data Governance program.
Given the expense associated with growth, privacy, and security, organizations face new sources of risk associated with their data assets. Data Governance represents the solution. A strong charter will help an organization navigate the perilous border between risk and opportunity.
This book is intended for any manager or team lead who intends to create a Data Governance program. The challenge of Data Management continues to expand with difficulties such as storage costs, exponential growth, administrative, management, and security concerns. With the right strategy, the organization can provide better services to customers / constituents while saving money.
Like many of the others on this list, Data Governance is relevant to the CDMP Specialist Exams, the next exam tier after the CDMP Fundamentals exam. If you’re excited by the prospect of doing a deep dive on any of the topics discussed in this article, I’d encourage you to read up on the process of getting recognized at the CDMP Practitioner or Master level.
TL;DR: The solution to scaling up data operations while avoiding risks is putting strong Data Governance into place. This book will show you how.
- Author: Laura Sebastian-Coleman
- Time to read: 12 hrs 32 mins (376 pages)
- Rating: 4.1 / 5 (29 total ratings)
Summary: Leading expert Sebastian-Coleman provides guidance on how to monitor and maximize Data Quality over time. It begins with standard measurement concepts and moves toward a detailed framework of different measurement techniques across the dimensions of Data Quality.
The book also provides common conceptual models for the definition and storage of Data Quality results from trend analysis. In addition, it includes generic specifications for ongoing measurement and monitoring, such as comparisons and calculations for making the measurements meaningful.
TL;DR: An organization’s goal for Data Quality should be to promote ongoing measurement, instead of single-instance activities.
Photo by Jason Leung on Unsplash
- Author: Danette McGilvray
- Time to read: 11 hrs 45 mins (352 pages)
- Rating: 4.6 / 5 (57 total ratings)
Summary: The Ten Steps refers to a systematic approach that combines a conceptual framework to understand Data Quality with the necessary tools and techniques to improve it. The book makes use of real world projects to highlight how these principles work to enhance Data Quality.
McGilvray emphasizes never addressing Data Quality for its own sake, but instead as a way to advance the organization’s specific mission. The Ten Steps methodology can be scaled up and down and applied to many Data Quality related situations.
TL;DR: Executing Data Quality Projects was recently updated in 2021. It includes examples, several templates, and practical advice for execution. Readers are guided on how to choose the next best action based on their organization’s unique position.
- Authors: Grant Fleming and Peter Bruce
- Time to read: 10 hrs 8 mins (304 pages)
- Rating: 3.5 / 5 (6 total ratings)
Summary: Look no further for an insightful, practical exploration of ethical issues that arise when the latest technology is applied to the largest and most sensitive records on the planet. This book guides Data Scientists on how to implement and audit machine learning models to mitigate unanticipated harms. This book offers technical implementation of interpretability techniques and other methods to reduce bias and inequity.
TL;DR: Responsible Data Science informs how stakeholders should implement data solutions. Following the guidelines in this book, the role of the Data Scientist is to combine detailed technical analysis with ethical social observation.
Photo by Oladimeji Ajegbile on Unsplash
- Authors: Mike Loukides, Hilary Mason, and DJ Patil
- Time to read: 1 hr 32 mins (46 pages)
- Rating: 4.4 / 5 (of 132 total ratings)
Summary: This short guide, available for free as a Kindle ebook, walks through recommendations, checklists, and terminology central to ethical data handling, particularly for Data Scientists. The book cites several cases of unethical data use and outlines recommendations that should be put into place in order to avoid these issues in the future.
The basic premise of the book is that ethical data science requires more than code or oath. Ethics and Data Science recommends a daily practice that implements a checklist. Data Scientists should treat customer data as if it was their own, feel empowered to challenge the assumptions of their organization, and use the “5 Cs” (consent, clarity, consistency, control, and consequences) to create excellent data products.
The book concludes with a case study written by Princeton researchers. Practices such as lean methodology and the use of bug bounties to find potential vulnerabilities are emphasized throughout.
TL;DR: This book is a simple, practical guide to Data Science ethics.
- Author: Dr. Carl Gold
- Time to read: 16 hr 20 mins (504 pages)
- Rating: 4.3 / 5 (16 total ratings)
Summary: A must read case study for anyone looking to break into Data Science. Fighting Churn with Data is full of practical examples from Dr. Gold’s career up to his time as the Chief Data Scientist at the subscription services company Zuora
Churn occurs when a paying customer leaves a subscription service. It is a crucial metric for any business with reoccurring revenue. As more and more companies move to the subscription economy, this is an important business model for Data Scientists to understand. Therefore, this book represents an excellent practice project for a budding Data Scientist or a skilled practitioner looking to better understand this niche area.
In the book and associated Twitch streamed videos, Dr. Gold provides guidance on the SQL and Python code required to conduct churn analysis. In both resources, Dr. Gold really delves into the process of feature engineering (i.e. finding and/or generating predictive features from a mess of raw data). For the project, Dr. Gold created a realistic simulated social network dataset so that data practitioners can follow his analysis through hands-on coding.
TL;DR: This is a solid and practical guide for all Data Scientists, as well as anyone looking to improve customer retention. The book teaches how to transform raw data into measurable behavioral indicators, calculate customer lifetime value, and use demographics to improve churn predictions.
Photo by freddie marriage on Unsplash
- Authors: Steven N. Brunton and J. Nathan Kutz
- Time to read: 16 hrs 25 mins (492 pages)
- Rating: 4.7 / 5 (186 total ratings)
Summary: This book is a strong intro to Data Science with an emphasis on the underlying mathematical principles. Read this for an advantage in data-driven decision making and data engineering best practices.
Brunton and Kutz deep-dive into data analysis and machine learning. This includes: neural networks, Lorenz system, dimensionality reduction and transforms, singular value decomposition (SVD), the Fourier transform, and sparse sampling.
The book also tackles topics such as dynamic mode decomposition, sparse identification of dynamics algorithm, and control theory. And it concludes with a discussion of reduced order models (ROM) that use the proper orthogonal decomposition (POD) algorithm to forecast the partial differential equation (PDE).
And if all that meant very little to you, no worries!! The book is supported by Steve Brunton’s YouTube channel, which offers another way to enhance your understanding of these concepts.
TL;DR: This is a very good overview of different methods of managing the data pipeline.
10. Data Literacy
- Authors: Peter Aiken and Todd Harbor
- Time to read: 14 hrs 15 mins (429 pages)
- Rating: 5 / 5 (1 total ratings)
Summary: A roadmap for expanding data literacy across billions of people, this book defines the knowledge required to operate in today’s business environment and engage constructively in a data-driven society.
It also describes how to build data literacy with the organization, beginning with a 12-step framework. This section outlines a valuable Data Doctrine. It also describes real world problems data practitioners may face as they work to improve the level of data literacy in their organization.
TL;DR: This book is a thorough guide for improving data literacy at a personal level and within your organization.
- Author: Andy Graham
- Time to read: 6 hrs 16 mins (188 pages)
- Rating: 4.6 / 5 (7 total ratings)
Summary: Master Data Management (MDM) represents a microcosm of the overall challenge of managing the consistency and integrity of data in an organization. This book explains MDM, the business rationale, and the numerous strategies that are crucial to its success.
After reading, you’ll have a solid foundation with which to introduce MDM in your organization or improve on existing practices. This book is crucial for anyone going on an “adventure” in this important domain. This book’s intended audience includes data professionals, information technology staff, project/program managers, data architects, business analysts, and technology leaders.
TL;DR: The concept of the “golden record” is core to Graham’s treatise on Master Data. Readers will receive a solid education on this concept, particularly how to identify the data sources that comprise the golden record.
Photo by Arif Riyanto on Unsplash
- Author: Pushpak Sarkar
- Time to read: 11 hrs 48 mins (354 pages)
- Rating: 4.3 / 5 (4 total ratings)
Summary: Increasingly, data is regarded as an asset that can be profitably offered as a service in and of itself. The book demonstrates how companies may benefit from data as a service (DaaS) through real-world case studies on a variety of architectures and associated patterns.
The book outlines an all-inclusive strategy to implement data as a service in any company, which includes (1) a framework for service oriented architecture (SOA) that is reusable and adaptable, (2) a plan to deliver DaaS to clients, and (3) a detailed description of each component of the DaaS architecture. Sarkar goes into depth on how to successfully collect and distribute data across heterogeneous platforms by using SOA principles, industry best practices, and emerging technologies such as data virtualization, cloud, and Data Science.
TL;DR: Data as a Service discusses how businesses may create income by offering data services in exchange for fee-based subscriptions.
- Authors: Lawrence Corr and Jim Stagnitto
- Time to read: 10 hrs 56 mins (328 pages)
- Rating: 4.6 / 5 (154 total ratings)
Summary: This book offers a great step-by-step guide for capturing data warehousing and business intelligence (DW & BI) requirements and turning them into high performance dimensional models by “modelstorming” (data modeling + brainstorming) with BI stakeholders.
In addition, readers will learn about Business Event Analysis & Modeling (BEAM), an agile approach to dimensional modeling for improving communication between data warehouse designers, BI stakeholders and the entire DW & BI development team. Read more.
TL;DR: With friendly diagrams and useful additional resources, Corr and Stagnitto provide a strong contribution to the Data Management field.
- Author: April Reeve
- Time to read: 6 hrs 48 mins (204 pages)
- Rating: 3.6 / 5 (12 total ratings)
Summary: Readers will learn the strategies, tools, and best practices for managing data transfer. This book discusses approaches for significantly decreasing the complexity of managing system interfaces and promoting scalable designs. Based on over two decades of expertise, Reeve puts forward a vendor-neutral strategy for transporting data between computational environments and data systems.
TL;DR: The typical organization is made up of dozens (if not hundreds) of computing systems that have been constructed, purchased, and acquired over time. Data must be integrated for reporting and analysis, shared for business transaction processing, and transformed as new systems are acquired.
- Author: Martin Kleppmann
- Time to read: 20 hrs 20 mins (611 pages)
- Rating: 4.8 / 5 (2440 Total Rating)
Summary: If you are interested in distributed systems and scalability, reading this book is a must. It provides a comprehensive understanding of the various technologies in the field with a detailed accounting of the various problems each technology is designed to solve. With this book, you can learn the most important concepts of Data Management quickly and in as entertaining a way as possible.
Kleppmann is on top of the latest techniques in the field. He consistently blends the relevant theories of computer science with real world use cases and applications. The focus is primarily on the core principles and thinking processes that apply when building data services.
TL;DR: If you’re involved in data engineering, systems design, cloud architecture, or DevOps, this is a good guide to have.
And there you have it — 15 books that you should read in order to master Data Strategy. If you are excited by the prospect of learning more about this field, you should consider signing up for my Data Strategy Newsletter. Each month we do a deep dive in one recent story related to Data Management and provide one productivity or health recommendation that is especially relevant to data practitioners.
You might also consider studying for the Certified Data Management Professional Exam. This 100 question test is open book, open notes, so it can be studied for and passed relatively quickly. Beyond offering a worthwhile credential in the Data Management field, the CDMP provides practical knowledge and frameworks you can use in your day-to-day work and to structure your advice to colleagues and clients.
Original. Reposted with permission.