Learning mathematics of Machine Learning: bridging the gap
We outline the four key areas of Maths in Machine Learning and begin to answer the question: how can we start with high school maths and use that knowledge to bridge the gap with maths for AI and Machine Learning?
Image source: Glenfinnan Viaduct – aka “The Harry Potter Bridge” source Wikipedia – an apt analogy bridging the known to the unknown!
Background
In April this year, I posted about the seven books to grasp the mathematical foundations of data science which was one of my most popular posts ever. It demonstrated to me that there is a real need to understand the maths foundations behind Data Science. As part of my teaching at the University of Oxford(Data Science for Internet of Things), I have often encountered the same issue in working with participants. I am also personally interested in democratising AI knowledge, especially for the younger generation. As an educator, the main problem in teaching the maths behind Data Science is
- Cognitive dependencies: There are many inter-dependent concepts and to explain something new, you need to explain the dependencies which can be numerous
- Cognitive overload: There are too many things to learn in a short timeframe. Related to this, is the fact that there is too much content out there. While, the content on the Web is excellent in many cases, it can be overwhelming
Hence, a simplified approach is needed to learn the mathematics behind Data Science and Artificial Intelligence. The idea I am working with is: How can we bridge the gap between maths needed for AI and Machine learning with that taught in high schools (up to ages 17/18)?
To put this idea into some more context: The maths behind Machine Learning comprises of four key areas:
- Linear algebra
- Statistics and Probability theory
- Multivariate calculus
- Optimization
So, the question then is: How can we start with high school maths and use that knowledge to bridge the gap with maths for AI and Machine Learning? I am fascinated with the approach of using maths to bridge this gap. Knowledge of maths is universal. It means, we could truly inspire someone with minimal resources to take up Data Science as a profession.
I plan to release a concise free eBook using this approach on KDnuggets. If you are interested in this eBook, please comment below (or any feedback in general re the approach). Connect with me (Ajit Jaokar) to learn about how we are using new strategies to accelerate learning of AI maths and coding. I am especially thankful to Sebastian Raschka for his comments and feedback to my work. The views in this post are my own.
Principles
We work with the following guiding principles
- We use High school maths as a starting point
- A Gentle onramp – i.e. approach complexity at later stages
- An emphasis on Deep learning and AI
- Building on the shoulders of giants – there are excellent sources on the Web. We don’t need to reinvent the wheel if we leverage them
- Concise / minimalism – The approach is a compass not a map. By taking a minimalist approach, we do not discuss every algorithm – rather provide a means to navigate the learning process
- We aim to lay foundations and to inspire. AI and Deep Learning are changing rapidly. New algorithms like GANs, Attention networks etc are already getting
- Explain using few examples and in a simple a simple problem context ex like Boston housing – predicting the price of a house
- Our goal is to connect the dots and inspire to learn more
Approach
The approach is based on working with only seven concepts:
1) Process as a black box model - Any process can be modelled as a black box ... i.e. to identify a functional relationship between the inputs and outputs. This allows us to introduce the basic ideas of probability, distributions etc i.e. statistical learning
2) The simplest functional relationship between inputs and outputs is linear relationship. We start with linear regression because it is taught in schools (y = mx +c )
3) From the linear equation, you can understand the workings of a Perceptron and hence the basics of a neural network
4) We then consider the ways of finding the best solution using techniques like closed-form and optimization leading to the idea of gradient descent (introducing the concept of defining a loss and minimizing it iteratively). Some problems can be solved easily using math (i.e., the closed form solution). but at some point, the math isn't sufficient, and we need to optimize iteratively. If we extend this approach to cover a range of functions, we understand neural networks as universal function approximators.
5) Linear regression can be extended to logistic regression using the General Linear Model (and hence to classification)
6) Evaluating a model – ROC curve and other techniques
7) Wider classification of algorithms based on flach
Logical models – use a logical expression to divide the instance space into segments and construct a logical model. Ex Tree models
Geometric models – models which can be described in terms of a geometry of the instance space. This applies to instances which are not inherently geometric (ex: age or temperature). Linear models are an example of this but also SVM, Kernel models, Distance based models – ex nearest neighbour, Clustering and others
Probabilistic models: Discriminative models – ex Naïve Bayes, Logistic regression or
Generative (GANs, Variational autoencoders)
Conclusions
I Welcome your comments. I plan to release a concise free eBook using this approach on KDnuggets. If you are interested in this eBook, please comment below (or any feedback in general re the approach). Connect with me (Ajit Jaokar) to learn about how we are using new strategies to accelerate learning of AI maths and coding.
Related: