Reinforcement Learning for Newbies

A simple guide to reinforcement learning for a complete beginner. The blog includes definitions with examples, real-life applications, key concepts, and various types of learning resources.

Reinforcement Learning for Newbies
Image by author


What is Reinforcement Learning (RL)


Reinforcement Learning (RL) is a machine learning model where the agent learns from trial and error to reach the goal. It is a goal-oriented algorithm where an agent receives a reward when it performs the correct action. These rewards are helping agents to navigate through a complex environment to reach the final goal. Just like a toddler learning to walk on its own by trial and error. Similarly, a machine learns to perform complex tasks without human intervention. 

RL is quite different from other machine learning algorithms. It learns from the environment and achieves better performance than humans. Whereas supervised and unsupervised learning models depend on existing data that is collected from humans and are limited to human intelligence. For example, Deepmind’s AlphaGo learned various strategies on its own to defeat the world champion of the Go board game. 


How Does Reinforcement Learning work?


Let’s take an example of a Mario game. At the start of the game agent (Mario) is at state zero, based on its state the agent will take an action. In this case, Mario will move forward. Now the agent is in a new state (new frame). The agent will receive a reward as it has survived moving forward. The agent will keep making moves until it has finished the stage or dead in the process. The main goal of RL is to maximize reward collection by taking minimum steps. 


What Are RL Applications?


Right now, the machine learning applications are limited to a single task and it is dependent on existing data. But in the future this will all change, we will be combining RL with computer vision, machine translation, and various types of models to achieve superhuman performance, for example:

  1. Self-driving cars: traveling become safer and fast
  2. Industry automation: warehouse management
  3. Trading and finance: stock price prediction
  4. NLP (Natural Language Processing): text summarization, question answering, and machine translation
  5. Healthcare: effective detection and treatment of diseases 
  6. Engineering: optimize large scale production
  7. Recommendation systems: better news, movies, and product recommendations. 
  8. Gaming: making better gaming levels to optimize player engagement
  9. Marketing and advertising: identify individuals and target them with ads based on the needs. 
  10. Robotics: performing complex and repetitive tasks. 


Key Components of Reinforcement Learning


There are so many things to learn about RL before we start building our own. In this section, we will learn key components of Reinforcement Learning and how each component interacts with each other.  

  • Agent: it can be a game character, robot, or car. An agent is an algorithm that takes an action. In real life the agent is a human. 
  • Action (A): is a set of all possible moves that an agent can perform. For example, Mario can jump, move left, right, and duck. 
  • Discount factor: future rewards are reduced, so it is worth less than immediate action to impose short-term hedonism on the agent. 
  • Environment: it is a world that interacts with agents. In Mario, the environment is the map. It takes the current state and agent’s action as an input and returns the reward and the next state. 
  • State (S): it is like a frame. When an agent takes an action, the state is changed from the current frame to the next frame in a Mario game. The current and next state is provided by the environment. 
  • Reward (R): is feedback or a prize given to an agent based on the previous action. It can be positive if the agent has completed the task and negative if it fails. Rewards can also be immediate and delayed. 
  • Policy (?): is a strategy that agents employees to get the highest possible rewards based on state and action. In simple words, it defines how an agent will take action based on the current state. 
  • Value (V): is an expected long-term return with a discount. 
  • Trajectory: is a sequence of states, and actions influenced by those states.  
  • Episode: a complete cycle of an agent, from start till the end. For example, Mario starts at the beginning, and when the current stage is complete, the first episode is completed. The episode is also completed when Mario dies. 
  • Exploit: taking the best action to maximize the reward collection. 
  • Explore random action taken to explore the environment without considering the rewards. 


Key Components of Reinforcement Learning


Learning Resources


It is just a start and if you want to learn more about Reinforcement Learning, start by learning the basics. Take a Youtube tutorial or complete a course. After that, start working on a project or participate in a competition. I learned everything about RL by participating in Kaggle competitions, and in the process if I get stuck, I read blogs or various tutorials to expand my knowledge. 






Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.