All blogs / A hands-on introduction to deep reinforcement learning using Unity ML-Agents

A hands-on introduction to deep reinforcement learning using Unity ML-Agents

August 26, 2021 • Joy Zhang • Tutorial • 3 minutes

A hands-on introduction to deep reinforcement learning using Unity ML-Agents



If you're new to reinforcement learning (RL), there's some great introductory courses out there. Just to name a few:

But if you're anything like me, you might prefer a 'learning by doing' approach. With hands-on experience upfront, it may be easier for you to grasp the theory behind the algorithms later.

In this series, I'll walk you through how to use Unity ML-Agents to build a volleyball environment and train agents to play in it using deep RL. For a bit of fun and extra incentive, you'll be able to submit your trained agent to the Ultimate Volleyball leaderboard and have it compete against other agents.

Ultimate Volleyball

Why ML-Agents?

ML-Agents is an add-on for Unity (a game development platform).

It lets us create complex physics-rich environments without needing to build any of the physics simulation logic ourselves. It also lets us experiment with state-of-the-art RL algorithms without having to set up any boilerplate code or install additional libraries. The nice graphics and interface are a plus.

A (very brief) overview of reinforcement learning

Lets use volleyball as an example. Our players (agents) initially know nothing about how to play volleyball. They'll start out taking actions completely at random. Through trial-and-error, they'll realise:

  • When they hit the ball and it goes over the net, they sometimes score points (positive feedback) ✔️
  • When they let the ball hit the floor, they lose a point (negative feedback) ❌

By continuing to do things that lead to positive outcomes, the agents will eventually learn to hit the ball over the net whenever it's on their side of the court. Reinforcement learning is a subdomain of machine learning which involves training an ‘agent’ (the volleyball player) to learn the correct sequences of actions to take (hitting the ball over the net) on a given state of its environment (the volleyball game) in order to maximize its reward (scoring points).

This can be illustrated more formally as:

Sutton and Barto example

Source: Sutton & Barto

For more on the theory, check out:


Note: This series is up-to-date with ML-Agents Release 18

P.S. If you enjoyed this article, check out Bomberland: an open machine learning challenge for the community.
Subscribe to get the latest articles in your inbox:
A virtual playground to practice, compete, and experiment with machine learning.
© 2021 Coder One Pty Ltd | Contact | Privacy