Gold BlogPlatinum Blog7 Books to Grasp Mathematical Foundations of Data Science and Machine Learning

It is vital to have a good understanding of the mathematical foundations to be proficient with data science. With that in mind, here are seven books that can help.



Most people learn Data Science with an emphasis on Programming. However, to be truly proficient with Data Science (and Machine Learning), you cannot ignore the mathematical foundations behind Data Science. In this post, I present seven books that I enjoyed in learning the mathematical foundations of Data Science.  ‘Enjoy’ is perhaps not the best of words since this effort is hard going!

So, why should you undertake the efforts of learning the Maths foundations of Data Science?

Here are some reasons which motivated me:

AI is rapidly changing. Geoffrey Hinton already believes we should rethink backpropagation. Understanding the Maths will help you understand the evolution of AI better. It will help you distinguish from others who approach AI from a superficial level. It will also help you to see the Intellectual Property(IP) potential of AI better. Finally, understanding the Maths behind Data Science could also lead you to the higher end jobs in AI and Data Science.

I have two additional motivations for working with these books.

  1. First, as part of my teaching Data science for Internet of Things course at Oxford University and also with my personal teaching on AI applications I have included the maths based approach.
  2. Second, I am writing a book to simplify AI from a Maths perspective for 14 to 18 year olds. To understand the foundations of Maths for Data Science and AI, you need to know four things i.e. Linear Algebra, Probability Theory, Multivariate Calculus, and Optimization. Most of these are taught (at least partially) in high schools. I am thus trying to relate high school maths to AI and Data Science with an emphasis on Mathematical modelling. Comments welcome on this approach.

Seven Books to Grasp Mathematical Foundations of Data Science

So, here is the list of books with my comments:

1. The Nature Of Statistical Learning Theory
By Vladimir Vapnik.

You cannot create a list about Maths books and not include the great Russian mathematicians! So, the first in my list is The Nature of Statistical Learning Theory by Vladimir Vapnik. Of all the books in this list, Vapnik is the hardest to find. I have an older Indian edition. Vladimir Vapnik is the creator of SVM. His Wikipedia page gives a lot more about his work.

2. Pattern Classification by Richard O Duda (2007-12-24)
By Richard O Duda

Like Dr Vapnik’s book, Duda is another classic from another era. First published in 1973. Updated 25 years later (2000) and nothing since! But yet a vital resource. The book takes a pattern recognition approach and provides extensive coverage of algorithms.

3. Machine Learning: An Algorithmic Perspective, Second Edition (Chapman & Hall/Crc Machine Learning & Pattern Recognition)
By Stephen Marsland

Stephen Marsland’s book is now in its second edition. Marsland was one of the earliest books I have read (I only have the first edition). Both are very good. The second edition I believe has lot more code in Python. Like the first two books, this book also places a heavy emphasis on Algorithms.

4. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
By Trevor Hastie, Robert Tibshirani, Jerome Friedman

Hastie is another classic. The version I have is very well printed with colours. This is another reference book.

5. Pattern Recognition and Machine Learning (Information Science and Statistics)
By Christopher M. Bishop

Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher M. Bishop is also an in-depth and well-presented reference book.

6. Machine Learning: The Art and Science of Algorithms that Make Sense of Data
By Peter Flach

I like Peter Flach’s book although some Amazon comments call it wordy and point out the lack of code. I like Flach especially for the grouping of algorithms (Logical models, Linear models, Probabilistic models) and the overall treatment of the themes.

Finally, my most recommended book:

7. Deep Learning
By Goodfellow, Bengio and Corville

If there is one book you should read end to end – it’s this one. Both detailed but also modern covering everything you can think of.

Two more worthy additions

If you can recommend any I have missed, please let me know

Concluding comments:

  1. Except for possibly the Goodfellow – Bengio book, I would not recommend reading the books cover to cover. I prefer to read the books by topic as needed i.e. as a reference book. I also like examples from different authors ex Duda for fish sorting; - Hastie - with advertising data sales TV and radio; Flach concept of hypothesis space with sea animals example etc
  2. I find that these books taught me a sense of humility i.e. How little we know and how vast and complex the field is
  3. These books are timeless. Vladimir Vapnik is now aged 81. Duda was published first in 1973. I expect 50 years from now, the industry would still be reading them. Like old friends who have stood the test of time. That’s a comforting thought. It shows the longevity of the maths based approach.

Related: