A Beginner’s Guide to Neural Networks with R!
In this article we will learn how Neural Networks work and how to implement them with the R programming language! We will see how we can easily create Neural Networks with R and even visualize them. Basic understanding of R is necessary to understand this article.
Train and Test Split
Let us now split our data into a training set and a test set. We will run our neural network on the training set and then see how well it performed on the test set.
We will use the caTools to randomly split the data into a training set and test set.
# Convert Private column from Yes/No to 1/0 Private = as.numeric(College$Private)-1 data = cbind(Private,scaled.data) library(caTools) set.seed(101) # Create Split (any column is fine) split = sample.split(data$Private, SplitRatio = 0.70) # Split based off of split Boolean Vector train = subset(data, split == TRUE) test = subset(data, split == FALSE)
Neural Network Function
Before we actually call the neuralnetwork() function we need to create a formula to insert into the machine learning model. The neuralnetwork() function won't accept the typical decimal R format for a formula involving all features (e.g. y ~.). However, we can use a simple script to create the expanded formula and save us some typing:
feats <- names(scaled.data) # Concatenate strings f <- paste(feats,collapse=' + ') f <- paste('Private ~',f) # Convert to formula f <- as.formula(f) f
Private ~ Apps + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + P.Undergrad + Outstate + Room.Board + Books + Personal + PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate
#install.packages('neuralnet') library(neuralnet) nn <- neuralnet(f,train,hidden=c(10,10,10),linear.output=FALSE)
Predictions and Evaluations
Now let's see how well we performed! We use the compute() function with the test data (jsut the features) to create predicted values. This returns a list from which we can call net.result off of.
# Compute Predictions off Test Set predicted.nn.values <- compute(nn,test[2:18]) # Check out net.result print(head(predicted.nn.values$net.result))
[,1] Adrian College 1.0000000000 Alfred University 1.0000000000 Allegheny College 1.0000000000 Allentown Coll. of St. Francis de Sales 0.9999999891 Alma College 1.0000000000 Amherst College 0.9999999994 ...
Notice we still have results between 0 and 1 that are more like probabilities of belonging to each class. We'll use sapply() to round these off to either 0 or 1 class so we can evaluate them against the test labels.
predicted.nn.values$net.result <- sapply(predicted.nn.values$net.result,round,digits=0)
Now let's create a simple confusion matrix:
0 1 0 57 7 1 6 163
Visualizing the Neural Net
We can visualize the Neural Network by using the plot(nn) command. The black lines represent the weighted vectors between the neurons. The blue line represents the bias added. Unfortunately, even though the model is clearly a very powerful predictor, it is not easy to directly interpret the weights. This means that we usually have to treat Neural Network models more like black boxes.
Hopefully you've enjoyed this brief discussion on Neural Networks! Try playing around with the number of hidden layers and neurons and see how they effect the results!
Want to learn more? You can check out my Data Science and Machine Learning Bootcamp with R course on Udemy! Get it for 50% off at this link:
If you are looking for corporate in-person training, feel free to contact me at: training AT pieriandata.com
Bio: Jose Portilla is a Data Science consultant and trainer who currently teaches online courses on Udemy. He also conducts training as the Head of Data Science for Pierian Data Inc.
Editor: Thanks to Cesar Cordova for finding an error in the original version of this post where "data" was used in place of "train" and thanks for Jose Portilla for quickly fixing it.