All blogs / A hands-on introduction to deep reinforcement learning using Unity ML-Agents

A hands-on introduction to deep reinforcement learning using Unity ML-Agents

August 26, 2021 • Joy Zhang • Tutorial • 2 minutes

A hands-on introduction to deep reinforcement learning using Unity ML-Agents


If you're new to reinforcement learning (RL), there's some great introductory courses out there. Just to name a few:

But if you're anything like me, you might prefer a 'learning by doing' approach. With hands-on experience upfront, it may be easier for you to grasp the theory behind the algorithms later.

In this series, I'll walk you through how to use Unity ML-Agents to build a volleyball environment and train agents to play in it using deep RL.

Ultimate Volleyball

Why ML-Agents?

ML-Agents is an add-on for Unity (a game development platform).

It lets us create complex physics-rich environments without needing to build any of the physics simulation logic ourselves. It also lets us experiment with state-of-the-art RL algorithms without having to set up any boilerplate code or install additional libraries. The nice graphics and interface are a plus.

A (very brief) overview of reinforcement learning

Lets use volleyball as an example. Our players (agents) initially know nothing about how to play volleyball. They'll start out taking actions completely at random. Through trial-and-error, they'll realise:

  • When they hit the ball and it goes over the net, they sometimes score points (positive feedback) ✔️
  • When they let the ball hit the floor, they lose a point (negative feedback) ❌

By continuing to do things that lead to positive outcomes, the agents will eventually learn to hit the ball over the net whenever it's on their side of the court. Reinforcement learning is a subdomain of machine learning which involves training an ‘agent’ (the volleyball player) to learn the correct sequences of actions to take (hitting the ball over the net) on a given state of its environment (the volleyball game) in order to maximize its reward (scoring points).

This can be illustrated more formally as:

Sutton and Barto example

Source: Sutton & Barto

For more on the theory, check out:


Note: This series is up-to-date with ML-Agents Release 18

Subscribe to get the latest posts in your inbox:
Tackle the world's most exciting artificial intelligence challenges with the community.
SitePoint LogoGeneral Assembly LogoHackathons Australia LogoDSAi Logo
Interested in sponsorship?
Sponsorship Enquiry Form
© 2022 Coder One Pty Ltd | Contact | Privacy