Reinforcement Learning

(Image credit: David Silver)

Reinforcement learning (RL) is a branch of machine learning that focuses on finding the best possible behavior or path in a given situation to maximize rewards.
Unlike supervised learning, reinforcement learning does not have a training dataset with correct answers. Instead, the reinforcement agent learns from its own experience and trial and error.
It uses algorithms to learn from outcomes and make decisions based on feedback, making it suitable for automated systems that need to make numerous small decisions without human guidance.

What makes reinforcement learning different from other machine learning paradigms?

There is no supervisor, only a reward signal
Feedback is delayed, not instantaneous
Time matters (sequential, non-independent identical distributions data)
Agent’s actions affect the subsequent data it receives

Rewards:

A reward $R_t$ is a scalar feedback signal.
Indicates how well the agent is doing at step $t$.
The agent’s job is to maximize cumulative reward

Reinforcement learning is based on the reward hypothesis:

Reward Hypothesis

All goals can be described by the maximization of the expected cumulative reward.

Example

In an example of training a dog to fetch a ball, the dog is the agent and its movements are the actions, while the environment includes the ground, person, and ball. The dog perceives the reward (a treat) in response to its action. Reinforcement Learning (RL) is a goal-directed learning and decision-making process through environmental interaction. The agent senses the environment's state and performs actions that lead to a change in the state and ideally generate a reward.

In RL, we assume the environment's state follows the Markov property, meaning each state depends solely on the previous state, action taken, and corresponding reward.

In contrast to supervised learning, the agent isn't given training examples and doesn't know the correct action. Unlike unsupervised learning, its goal isn't to find structure in the input but to maximize long-term rewards and minimize punishments.

RL key concepts

Let's break down the key concepts:

State: The current situation the agent is in.
Action: What the agent does in response to the state.
Reward: A signal (positive or negative) the agent receives for taking an action in a state.
Policy: A set of rules that determines how the agent selects actions based on the state it's in.
Return: The total amount of reward expected in the future, starting from a particular state.
Value function: An estimate of how good it is for the agent to be in a specific state, considering future rewards.
Environment: Everything the agent interacts with; it provides the states, and the rewards for the agent's actions.

Deep Reinforcement Learning (DRL)

Deep reinforcement learning (DRL) is a subfield of machine learning that merges reinforcement learning (RL) and deep learning. RL involves computational agents learning to make decisions through trial and error. Deep RL integrates deep learning, enabling agents to make decisions from unstructured data without manually engineering the state space. These algorithms can process large inputs (like every pixel in a video game) to optimize an objective (like maximizing the game score). Deep RL has diverse applications, including robotics, video games, natural language processing, computer vision, education, transportation, finance, and healthcare.

Play with Huggy

Examples of DRL

AlphaGo Zero. AlphaGo Zero is a program developed by DeepMind that achieved superhuman ability to play the game Go, without relying on human game data for training.
Frigatebird, AI-controlled sailplanes. Microsoft's controller system, compatible with various autopilot hardware like ArduPilot and Raspberry Pi 3, enables sailplanes to autonomously utilize naturally occurring thermals for sustained flight, eliminating the need for a motor or human intervention.
Locomotion Behavior. DeepMind researchers gave agents diverse environments with varying difficulty levels. This process led the agents to learn advanced locomotion skills without any reward engineering.
Data center cooling using model-predictive control. Data centers, crucial to the digital revolution, handle data storage, transfer, and processing, accounting for ~1.5% of global energy use. If unchecked, this consumption will rise. However, in 2016, DeepMind and Google Research used reinforcement learning models to cut their data center's energy use by 40%.
Controlling nuclear fusion plasma. A recent (2022) and interesting application of RL is in controlling nuclear fusion plasma with the help of reinforcement learning.

📔 See Jupyter Notebook with Examples

References

Introduction to Reinforcement Learning. David Silver.
Algorithms for Reinforcement Learning. Csaba Szepesvári.

Created: 03/01/2024 (C. Lizárraga); Last update: 03/14/2023 (C. Lizárraga)

CC BY-NC-SA

UArizona DataLab, Data Science Institute, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly