The Gentlest Introduction to Tensorflow – Part 3
This post is the third entry in a series dedicated to introducing newcomers to TensorFlow in the gentlest possible manner. This entry progresses to multifeature linear regression.
By Soon Hin Khor, Coorganizer for Tokyo Tensorflow Meetup.
Editor's note: You may want to check out part 1 and part 2 of this tutorial before proceeding.
Quick Review
The premise of the previous articles was: given any house size (square meters/sqm), which is the feature, we want to predict the house price ($), the outcome. To do that we:
 We find a straight line (linear regression) that ‘bestfits’ the data points that we have. The ‘bestfit’ is when the linear regression line ensures that the difference between the actual data points (gray dots) and the predicted values (gray dots interpolated on to the straight line), which, in other words, is the sum of multiple blue lines, is minimized.
 With this straight line we can predict any value of house
Predicting using Singlefeature Linear Regression.
Multifeature Linear Regression Overview
In reality, any prediction relies on multiple features, so we advance from singlefeature to 2feature linear regression; we chose 2 features to keep visualization and comprehension simple, but the concept generalizes to any number of features.
We introduce a new feature, ‘Rooms’ (number of units in the house). When collecting datapoints, we must now collect values for the new feature ‘rooms’ on top of the existing feature ‘house size’, as well as the corresponding outcome ‘house price’.
Our chart becomes 3dimensional.
Datapoints for the outcome ‘House Price’ and its 2feature (‘Rooms’ & ‘House Size’) space.
Our goal then becomes predicting ‘house price’, given ‘rooms’, and ‘house size’ (see image below).
Prediction for a given 2feature sometimes cannot be done due to missing of datapoints.
In the singlefeature scenario, we had to use linear regression to create a straight line to help us predict the outcome ‘house size’, for cases where we did not have datapoints. In a 2feature scenario, we can also employ linear regression, but to create a plane (instead of a straight line) to help us predict (see image below).
Using linear regression on 2feature space to create a plane to do prediction.
Multifeature Linear Regression Model
Recall for a singlefeature (see left of image below), the linear regression model outcome (y) has a weight (W), a placeholder (x) for the ‘house size’ feature, and a bias (b).
For 2feature (see right of image below), we introduce another weight, which we call W2, and another placeholder, x2 to hold the ‘rooms’ feature value.
1feature vs. 2feature linear regression equations.
When we perform linear regression, gradient descent helps us learn the additional weight W2, on top of the learning W, b as previously discussed.
Multifeature Linear Regression in Tensorflow
Quick Review
Our TF code for singlefeature linear regression consists of 3 parts (see image below):
 Constructing the model (blue part)
 Constructing the cost function based on the model (red part)
 Minimizing the cost function using gradient descent (green part)
Tensorflow code for 1feature linear regression.
Tensorflow for 2feature Linear Regression
The change to support 2feature linear regression equation (explained above) in TF code is shown in red.
Note this way of adding new features is inefficient; as the number of features grow, the number of required variables and placeholders increases. In reality models have many more features, which worsens this problem. How can we represent features efficiently?
Matrices to the Rescue
First, let us generalize representing a 2feature model to an nfeature one:
It turns out that the complex nfeature formula can be simplified in the world of matrices, and matrices are inbuilt into TF for these reasons:
 Data can be represented in multidimensions, which fits the way we want to represent a datapoint with n features (below left, also known as the feature matrix) and a model with n weights (below right, also known as the weight matrix)
1 datapoint’s n Features and the model’s n Weights in matrix form.
In TF, they would be written as:
x = tf.placeholder(tf.float, [1,n])
W = tf.Variable(tf.zeros[n,1])
NOTE: For W we use tf.zeros, which initializes all W1, W2, ..., Wn to zeros.
 Mathematically matrix multiplication is a sum of multiplications (just accept this as part of mathematics); thus naturally the matrix multiplication between the features (the one in the middle) and weights (the one on the right) matrices gives you the outcome (the one on the left), which is equivalent to first part of the nfeature linear regression formula (described above), i.e., without the biases
Matrix multiplication between Features and Weights matrices gives the outcome (without biases added).
In TF, this multiplication would be:
y = tf.matmul(x, W)
 Matrix multiplication between a multirow feature matrix (each row representing a datapoint’s n features), returns multirow outcomes (each row representing the outcome/prediction (without bias added) of each datapoint); thus a single matrix multiplication can apply the linear regression formula to multiple datapoints to produce multiple predictions, one for each datapoints, at a single go (see below)!
Note: The x representations in the feature matrix become more complex, i.e., we use x1.1, x1.2, instead of x1, x2, etc. because the feature matrix (the one in the middle) has expanded from representing a single datapoint of nfeatures (1 row x n columns) to representing m datapoints with nfeatures (m rows x n columns), so we extended x<n>, e.g., x1, to x<m>.<n>, e.g., x1.1, where n is the feature number and m is the datapoint number.
Multiple row matrix multiplication with model weights produce multiple row matrix outcomes.
In TF, they would be written as:
x = tf.placeholder(tf.float, [m, n])
W = tf.Variable(tf.zeros[n,1])
y = tf.matmul(x, W)
 Finally, adding a constant to the outcome matrix results in the constant being added to every row in the matrix
In TF, with our x, and W represented in matrices, regardless of the number of features our model has or the number of datapoints we want to handle, it can be simplified to:
b = tf.Variable(tf.zeros[1])
y = tf.matmul(x, W) + b
Tensorflow Multifeature Cheatsheet
We do a sidebyside comparison to summarize the change from single to multifeature linear regression:
1feature vs nfeature linear regression model in Tensorflow.
Wrapping Up
We illustrated the concept of multifeature linear regression, and showed how we extend our model and TF code from single to 2feature linear regression models, which is generalizable to nfeature models. We conclude by presenting a cheatsheet for multifeature TF linear regression model.
Coming Up Next
We will present the concepts of logistic regression, crossentropy, and softmax, which will enable us to fully understand Tensorflow’s official beginner’s tutorial on MNIST.
References
 Github: TF for multifeature linear regression without matrices
 Github: TF for multifeature linear regression with matrices
 The slides on Slideshare (1–43)
 The video on YouTube (0:00 to 7:18)
Bio: Soon Hin Khor, Ph.D is using tech to make the world more caring, and responsible. Contributor of rubytensorflow. Coorganizer for Tokyo Tensorflow meetup.
Original. Reposted with permission.
Related:
Top Stories Past 30 Days

