R Fundamentals: Building a Simple Grade Calculator
In this tutorial, we'll teach you the basics of R by building a simple grade calculator. While we do not assume any R-specific knowledge, you should be familiar with general programming concepts.
Creating comments
In the previous exercises, we made multiple calculations using operators. Later on, when we're writing hundreds of lines of code, it's good programming practice to organize our code. We can organize our code by inserting comments. Comments are notes that help people — including yourself — understand the code. The R interpreter recognizes comments, treats them as plain text and will not attempt to execute them. There are two main types of comments we can add to our code:
- inline comment
- single-line comment
Inline comment
An inline comment is useful whenever we want to annotate, or add more detail to, a specific statement. To add an inline comment at the end of a statement, start with the hash character (#
) and then add the comment:
print( (92 + 87 + 85)/3 # Finding the math score )
While we don't need to add a space after the hash character (#
), this is considered good style and makes our comments cleaner and easier to read.
Single-line comment
A single-line comment spans the full line and is useful when we want to separate our code into sections. To specify that we want a line of text to be treated as a comment, start the line with the hash character (#
):
# Here, we're finding the average of our scores. Then, subtracting this average from the math score. print( 88 - ((88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7) )
Let's add comments to our code!
print( # Adding some comments. 88 - ((88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7) ) # Adding more comments.
Assigning values to a variable
Using R to make simple calculations is useful. However, a more robust approach would be to store these values for later use. This process of storing values is called variable assignment. A variable in R, is like a named storage unit that can hold values.
The process of assigning a variable requires two steps:
- Naming the variable.
- Assigning the value to the name using
<-
.
When naming a variable, there are a few rules you must follow:
- A variable name consists of letters, numbers, a dot, or an underline.
- We can begin a variable with a letter or a dot. If it's a dot, then we cannot follow it with a number.
- We cannot begin a variable with a number.
- No special characters allowed.
For more detail, here is a table detailing what variable names are allowed and which are not:
Let's return to our math score calculation: (92 + 87 + 85)/3
, the result of this calculation is 88
. To store 88
in a variable called math
, let's write the following expression:
math <- 88
And then if we tried to print()
math, like this:
print(math)
This would display:
[1] 88
Variables, not only can hold the result of our calculation, we can also assign the value of an expression:
math <- (92 + 87 + 85)/3
And then if we tried to print math, like this:
print(math)
This would display the same result as our original calculation:
[1] 88
We've stored our math grade in a variable. As a reminder, here are the classes and grades:
- chemistry: 87.66667
- writing: 86
- art: 91.33333
- history: 84
- music: 91
- physical_education: 89.33333
Let's store our other scores in variables.
math <- 88 chemistry <- 87.66667 writing <- 86 art <- 91.33333 history <- 84 music <- 91 physical_education <- 89.33333
Performing calculations using variables
Now that we've stored our grades for each class in a variable, we can use these variables to find the grade point average.
Let's look at our math and chemistry scores:
math <- 88 chemistry <- 87.66667
When performing a calculation, variables and values are treated the same. Using our math
and chemistry
variables, 88 + 87.66667
is the same as math + chemistry
. When performing calculations using variables, the PEMDAS rule still applies.
If we wanted to see how much better you did in math, than chemistry, we can use the subtraction -
arithmetic operator to find the difference:
math <- 88 chemistry <- 87.66667 print(math - chemistry)
This displays:
[1] 0.33333
If we wanted to find the average score between math and chemistry, we can use the +
,/
,()
operators on the two variables:
(math + chemistry)/2
This displays:
[1] 87.83334
After we make these calculations, we can also store the result of these expressions in a variable. If we wanted to store the average of math and chemistry in a variable called average
, it would look like this:
average <- (math + chemistry)/2
Displaying the average would return the same value 87.83334
.
Let's calculate your grade point average using the following variables:
* math <- 88
* chemistry <- 87.66667
* writing <- 86
* art <- 91.33333
* history <- 84
* music <- 91
* physical_education <- 89.33333
## Classes math <- 88 chemistry <- 87.66667 writing <- 86 art <- 91.33333 history <- 84 music <- 91 physical_education <- 89.33333 ## Calculation gpa <- (math + chemistry + writing + art + history + music + physical_education)/7
Then, let's subtract your gpa
from history to see if history is below the average. Store this difference in history_difference
.
history_difference <- history - gpa
Creating vectors
From our previous example, calculating your grade point average using variables is useful. However, in data science, we often work with thousands of data points. If you had the score of each individual homework assignment, exam or project for each class, our data set would get large. Returning to our math, chemistry example, let's look at the current variables:
Rather than store these two values in two variables, we need a storage unit that can store multiple values. In R, we can use a vector to store these values. A vector is a storage container that can store a sequence of values. We can then name a vector using a variable. Like this:
To create a vector, you'll be using c()
. In R, c()
is known as a function. Similar to the print()
statement, the c()
function takes in multiple inputs and stores these values in one place. The c()
function doesn't perform any arithmetic operation on the values, it just stores those values. You can read more about the c()
function here.
Here are the steps to creating a vector:
- Identify the values you want to store in a vector and place these values within the
c()
function. Separate these values using a comma(,
). - Assign the vector to a name of your choice using
<-
.
Let's create a vector that contains your math and chemistry scores. The math score was 88
and the chemistry score was 87.66667
.
math_chemistry <- c(88,87.66667)
We could also create the vector using your variable names as well:
math_chemistry <- c(math,chemistry)
If we were to print(math_chemistry)
, it would look like this:
[1] 88.00000 87.66667
On the other hand, if we tried to store a sequence of values, like this:
math_chemistry <- 88, 87.66667
The R interpreter will only try to assign 88 to math_chemistry
but will not be able to interpret the comma after 88:
Error: unexpected ',' in "math_chemistry <- 88,"
Let's store our final scores in a vector using the following variables:
math <- 88 chemistry <- 87.66667 writing <- 86 art <- 91.33333 history <- 84 music <- 91 physical_education <- 89.33333
final_scores <- c(math, chemistry, writing, art, history, music, physical_education)
Calculating the mean
Now that we've stored your grades in a vector, we can calculate the grade point average. In a previous exercise, you used an arithmetic operator to calculate your grade point average:
(88 + 87.66667 + 86 + 91.33333 + 84 + 91 + 89.33333)/7
While this solution works, this solution isn't scalable. Now that you created a vector, we have an easier way of calculating the grade point average.
To calculate the grade point average using a vector, use the mean()
function. The mean()
function will take an input(the vector) and calculate the average of that input. The interpreter will then display the result.
Let's apply the mean()
function to our math_chemistry
vector:
math_chemistry <- c(88,87.66667) mean(math_chemistry)
This would return:
[1] 87.83334
We can then store the result of mean(math_chemistry)
in a variable for later use:
average_score <- mean(math_chemistry)
Let's apply the mean()
function on your final grades vector!
## Vector of Final Scores final_scores <- c(math, chemistry, writing, art, history, music, physical_education) ## Calculating the mean gpa <- mean(final_scores)
Performing operations on vectors
In the last screen, you calculated your final grade using the mean()
function and a vector. In data science, there are always multiple questions you can answer with your data.
Let's dig deeper into our final_grades
vector and ask it a few more questions:
- What was the highest score?
- What was the lowest score?
- How many classes did you take?
To answer these questions, you'll need a few more functions:
min()
: Finds the smallest value within the vectormax()
: Finds the largest value within the vectorlength()
: Finds the total number of values the vector holdssum():
: Takes the sum of all the values in the vector( Note: Will not be used in this tutorial.)
You can apply this functions, similar to how you applied the mean()
function. To find the max score in our math_chemistry
vector, we'll apply the max()
function on this vector:
math_chemistry <- c(88,87.66667) max(math_chemistry)
This displays:
[1] 88
Let's answer a few more questions about your grades!
- Which class did you score highest in? Use
max()
. - Which class did you score lowest in? Use
min()
. - How many classes did you take? Use
length()
.
final_scores <- c(math, chemistry, writing, art, history, music, physical_education) ## Highest Score highest_score <- max(final_scores) print(highest_score) ## Lowest Score lowest_score <- min(final_scores) print(lowest_score) ## Number of Classes num_classes <- length(final_scores) print(num_classes)
[1] 91.33333 [1] 84 [1] 7
Next steps
If you'd like to learn more, this tutorial is based on our R Fundamentals course, which is part of our Data Analyst in R track. Building upon the concepts in this tutorial, you'll learn:
- More complex ways to manipulate a vector:
- Indexing into a vector
- Filtering out different values in a vector
- Different behaviors of a vector
- Make university recommendations using matrices
- Creating your own matrix
- Slicing and re-organizing a matrix
- Sorting a matrix
- Analyze college graduate data using a dataframe
- The different data types that go into a dataframe
- Select and subsetting specific values in a dataframe
- Adding conditions into dataframe selections
- Using lists to store a variety of values
- Indexing into a list
- Adding and Subtracting values from a list
- Merging Lists
Start the free R Fundamentals course
Bio: Jeffrey M Li is a Data Scientist at Dataquest.
Original. Reposted with permission.
Related:
- Control Structures in R: Using If-Else Statements and Loops
- Building a Daily Bitcoin Price Tracker with Coindeskr and Shiny in R
- 6 Reasons Why Python Is Suddenly Super Popular