6 Books About Open Data Every Data Scientist Should Read

Check out this collection of six books which tackle the hard skills required to make sense of the changing field known as open data and muse on the ethical implications of a digitally connected world.

Data scientists are set to remain in high demand over the coming years. Whether you’re in this field already or hoping to break into it soon, it’s obvious thriving here requires constant learning. The speed with which the world’s data analysis infrastructures have improved their reach and usefulness has been remarkable — but it’s also reinforced the necessity of openness and accountability.

Some of these six books tackle the hard skills required to make sense of the changing field known as open data, while others muse on the ethical implications of a digitally connected world. Each one is a useful and enlightening read for data scientists everywhere.



1. "Open Data Structures: An Introduction" — Pat Morin

Pat Morin’s Open Data Structures: An Introduction, from CreateSpace Independent Publishing, is an excellent place for the uninitiated to begin their journey into open data. The book explores how to analyze data structures in a variety of contexts, including sequences, priority queues, graphs, stacks and ordered and unordered dictionaries.

If some of that sounded like Greek to you, that’s OK. This is an introduction, after all — and the book makes sure to include Java source code for each major component. Readers should find this an accessible, practical and mathematically focused way to ease into this complex but fascinating topic. It’s intended for all self-learners as well as undergraduate students.


2. "The Global Impact of Open Data" — Stefaan Verhulst and Andrew Young

Globalization is a fraught and controversial concept, but nobody questions how mobile and global our data has become. For anybody who wants an extremely modern take on the intersection of globalization and open data analysis, it makes sense to turn to thought leaders. Hailing from NYU’s GovLab, authors Verhulst and Young have put together an O’Reilly guide that will appeal to data scientists, policymakers, small business owners and privacy activists alike.

The case studies in this text should be especially compelling for the data science crowd. A future where data is open, shareable and truly useful is a future where technologists work hand in hand with public and private entities to solve mutual challenges, devise new standards and APIs and employ data science to better understand weather and climate, and even plot and predict public health crises.


3. "Data for the People: How to Make Our Post-Privacy Economy Work for You" — Andreas Weigend

Andreas Weigend’s Data for the People, available now from Hachette Book Group, is a must-read for privacy-minded data scientists. Gone are the days when citizens remained in the dark about how their digital lives get surveilled, mined for profitable insights and even sold to unknown third parties. It’s not just our web browsing habits, either — the panoply of connected devices we rely on represent, essentially, a global surveillance network. Whether or not people use this network for pro-social objectives relies on the intentions of those tasked with building it.

That means data scientists. Weigend himself has been a consultant for businesses, financial and health care entities and even the educational community. He argues that big data and data science are tools for positive change, but that we haven’t yet laid down a common framework to reconcile the needs of big business with the right to online privacy. This book proposes ways to “make data work for us.” And “us” means “everybody.”


4. "The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences" — Rob Kitchin

As it explores how quickly the worldwide data landscape is changing and reinventing itself, this book never loses focus on the distinctions between big data and open data. Rob Kitchin’s The Data Revolution, despite its title, cuts through a lot of the hype and hyperbole surrounding data infrastructures and analysis — and supports its arguments about the modern-day landscape with brief looks backward at how the many pieces all came together.

The “revolution” explored here involves the variety, breadth and accessibility of data among businesses, regulatory agencies, local and national governments, lobbyists, journalists and more. The book presents more of a “consequentialist” reading of modern trends in open data, making it an important resource for scientists who want to know more about the civic, ethical and political implications of a world that transmits and analyzes information on an unprecedented scale.


5. "Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing and Fast Innovation" — Joel Gurin

Author Joel Gurin puts many varied years of experience to work in Open Data Now. Having worked with nonprofits, journalists, governments and as an executive VP at Consumer Reports, Gurin sees open data as a tool that can help organizations of all kinds launch new projects and products and better understand how data can spur innovation and helps achieve a better connection with audiences.

Data scientists don’t just need a variety of concrete skills to draw from. They also need to know how those skills can apply to making more data-driven decisions in investing, developing startups, R&D, community organizing, interfacing with the public and much more. This book is ideal for decision-makers and data scientists alike, since it thinks “big picture” and presents credible strategies for engaging in data analysis in a collaborative, ethical and consequential manner.


6. "Data Science for Transport: A Self-Study Guide With Computer Exercises" — Charles Fox

It would be hard to overstate the vital role big data will come to play in the smart city and automotive technologies of the future. The rise of telecommuting, the urgency of displacing combustion engines with electric and the challenge of bringing autonomy to commercial and personal vehicles means the future of urban planning will look vastly different than it does today.

Data Science for Transport, from Springer International, presents a look at this future and highlights the vital role data scientists can play in bringing it to fruition. This book delves into the ways researchers and transportation technologists can use databases and mathematical models to better understand the world’s transportation problems and come up with practical, scalable and inclusive solutions.

We hope you’ve enjoyed this look at six essential books for current and aspiring data scientists. If we missed one of your favorites, chime in below and let us know!

Bio: Kayla Matthews discusses technology and big data on publications like The Week, The Data Center Journal and VentureBeat, and has been writing for more than five years. To read more posts from Kayla, subscribe to her blog Productivity Bytes.

Original. Reposted with permission.