TensorFlow: Building Feed-Forward Neural Networks Step-by-Step
This article will take you through all steps required to build a simple feed-forward neural network in TensorFlow by explaining each step in details.
Preparing the Neural Network Layers and their Parameters
Our example is very simple that has no hidden layers and just there is a single neuron in the output layer. So, we are going to explain how to create such layer and prepare its parameters and create its neuron's activation function which is sigmoid in our case.
There is a problem in placeholders. Placeholder value can`t be changed once assigned. After it is given a value then placeholder can be regarded a constant. Thus it is not the suitable option for trainable parameters like ones used in this example (weight and bias). Trainable parameter is assigned an initial value and that value got changed until reaching the best value making the underlying model produce least errors.
For our neural network, there are two trainable parameters which are weight and bias. These parameters are not suitable for being stored in placeholders as we want to update them until getting their best values. This is why there is something in TensorFlow called Variable.
A TensorFlow Variable is very similar to variables in Java, C++, Python, and any language with the concept of variables. You can assign an initial value to a TensorFlow Variable and that value can get changed multiple times. To create a TensorFlow Variable, you must specify its data type and shape. The following example shows how to create a variable rather replacing the previous example`s placeholder:
The changes compared to the previous placeholder code are in lines 4, 11, and 14. The variable is created in line 4 by specifying its initial value and data type. The dtype argument is not required but the initial_value argument must be specified. The initial_value argument specifies both the size and the data type (if dtype is missing). Note that Variable is class but placeholder is an operation.
Line 14 prints the value of the variable. Note that the Variable won`t be initialized until calling the global_variables_initializer() operation as in line 11 in order to make the variable actually initialized. Trying to use the variable without calling this operation will return an error.
In CodeSample1, there are two variables created for our two parameters: weight (line 8) and bias (line 9).
Parameters Initial Values
Note that they both weight and bias has initial values. We are using fixed initial values for them. The initial values are set randomly and there is no rule used for generating them. You may use any random number generation operation in TensorFlow for doing that such as tensorflow.truncated_normal(). But note that the initial values are critical for creating a robust model able to predict the right class after being trained. Bad initial values for weights and bias of a neural network can make its neurons to die. This is why there are many techniques used to generate such initial values.
As a summary, a placeholder is used to store input data that are to be initialized once and used multiple times. But Variable is used for trainable parameters that are to be changed multiple times after being initialized.
After preparing all parameters required by the output layer's neuron, we are ready for using the activation function as in line 15 of SampleCode1.
Our activation function is used to merge all inputs, weights, and bias into a single value describing the expected class score of each input.
Normally, there is a weight for each input. Each input is multiplied by its corresponding weight. We don`t have to make element-by-element multiplications and matrix multiplication can be a good solution as in line 13.
Just prepare a matrix for inputs and another one for weights and multiply these two matrices. The tensorflow.matmul() operation make that for you.
Then bias is added to the summation of individual input-weight multiplications as in line 13. The result, af_input in our example, is then applied to the activation function as in line 15. We are not going to create the function manually as it is already supported by tensorflow.nn API. There are different types of activation functions and sigmoid is sufficient for our case. In our case, the result returned by the sigmoid activation function represents the expected class score of the current input.
Evaluating the model is an essential step after it has been trained. This is why there is a loss function in line 18 in CodeSample1.
It is very simple but at least do well for our case. Just find the difference between each desired output and its corresponding predicted output by the model. The goal is to measure how far the predicted outputs of the trained neural network from their corresponding desired outputs. To find a single value representing the overall error of the network, the summation of individual differences is calculated using the tensorflow.reduce_sum() operation.
Updating Network Parameters
The prediction error of the model may not be zero from the first trial and it may be very high. This is why there must be a way for automatically updating and optimizing the model parameters to get the least possible error. One of the common optimizers is gradient descent (GD). GD tries to find the relationship between each parameter and the prediction error to know how each parameter affects the error. This is by first trying the initial parameters. If they didn`t do well, then GD will try to change the parameters values and moving toward the direction that minimizes the error. The GD optimizer is applied in our example in line 20 to minimize the previously calculated prediction error. The learning_rate of the tensorflow.train.GradientDescentOptimizer is just a hyper-parameter.
Previously, we prepared the steps to follow from accepting the inputs to generating the prediction error of the model. Moreover, we described how the model parameters are to be updated. The remaining step is to go into a loop that updates the parameters automatically.
There are some work to be done before going into the loop including:
- Creating the Session as in line 24.
- Initializing all Variables as in line 27.
After that we can run the training loop as in lines 42 and 43. Note that the only parameter specified to be fetched is the Tensor returned by tensorflow.train.GradientDescentOptimizer which is train_op. This is because fetching train_op will cause all parameters to be updated. For understanding why we requested fetching train_op and no other Tensors, please read my previous article. https://www.linkedin.com/pulse/tensorflow-what-hyperparameters-optimize-ahmed-gad
The loop will last for 10,000 iterations and at each iteration the GD optimizer will generate new values for the parameters that decreases the error.
Because data are stored into placeholders, then these placeholders must be initialized using the feed_dict argument of the tensorflow.Session.run() operation. The weights and bias placeholders are initialized using a previously created NumPy arrays as in lines 30 and 36.
We can also avoid creating a separate NumPy arrays and make assign the data to the placeholders within the run() operation as follows:
But for code clarity, the NumPy arrays are created separately from the run() operation.
Testing the Trained Neural Network
After getting out of the training loop, the neural network will be trained and ready for predicting unknown samples. In line 48, two new samples were used for testing the network accuracy.
The expected class scores are as follows:
This is not accurate result because it say that the expected class score of the two samples is indexed 1 which is the BLUE class. The first sample is of class indexed 0 which is RED.
What is the reason for such weak prediction for such very simple example? The reason is the bad use of the initial values for the network parameters (weights and bias). We can try using different initial values and see how the results get changed. For example, using the tensorflow.truncated_normal() operation, we can generate the initial values for both weights and bias as follows:
Then we can train the network again using the newly used initial values and predict the results again. The result of expectation is as follows:
The new values for the weights and bias can be printed as follows:
Here is the result:
The results enhanced so much and the error is now 0.0 compared to 1.0 in the previous fixed initial values. This proves the importance of initializing the neural networks weights well.
This is the end for our simple example. Next, we will explore another example that creates the XOR logic gate using a neural network with one hidden layer containing two neurons.