TeachMyAgent

TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL

TeachMyAgent is a testbed platform for Automatic Curriculum Learning (ACL) methods in Deep Reinforcement Learning (Deep RL). We leverageprocedurally generated environments to assess the performance of teacher algorithms in continuous task spaces. We provide tools for systematic study and comparison of ACL algorithms using both skill-specific and global performance assessment.

We release our platform as an open-source repository along with APIs allowing one to extend our testbed. We currently provide the following elements:

Two parametric Box2D environments: Stump Tracks (an extension of this environment) and Parkour
Multiple embodiments with different locomotion skills (e.g. bipedal walker, spider, climbing chimpanzee, fish)
Two Deep RL students: SAC and PPO
Several ACL algorithms: ADR, ALP-GMM, Covar-GMM, SPDL, GoalGAN, Setter-Solver, RIAC
Two benchmark experiments using elements above: Skill-specific comparison and global performance assessment
A notebook for systematic analysis of results using statistical tests along with visualisation tools (plots, videos...)

Introducing new morphologies

We introduce new embodiments for our Deep RL agents to interact with environments. We create three types of morphology:

Walkers
Swimmers
Climbers

Additionally, we introduce new physics (water, graspable surfaces) in our environments allowing to make each type of embodiment have a preferred "milieu" (e.g. walkers cannot survive underwater). These new embodiments along with our new physics create agents requiring novel locomotion such as climbing or swimming. We give a detailed presentation of our morphologies on this page, as well as some policies we managed to learn using a Deep RL learner.

Legless climber

Fish

Millipede

Spider

Introducing the Parkour environment

CPPN-based PCG

We introduce a new parkour track environment which uses a CPPN-encoded procedural generation. Using this, we present a highly uneven terrain composed of a ground and a ceiling. We add water controlled through a level parameter as well as creepers along with our new physics (simulation of some water physics and introduction of possible grasping actions on creepers and the ground). The generation of tasks is controlled with a 6D vector.

Milieu-specific agents

The difficulty of the terrain along with the characteristics of our embodiments (e.g. swimmers die out of water) create unfeasible regions in the Parkour's task space depending on the morhpology used. This makes our parkour track a challenging learning problem requiring an efficient ACL method as most of the tasks are unfeasible for all our embodiments.

Such a hard environment requires strong locomotion skills to go over tasks. We show below some of the behaviours that emerged in the Deep RL policies we trained.

Bipedal walker

The roughness of our CPPN-generated terrain forces the Deep RL student to learn robust policies.

Fish

Agents can learn realistic swimming policies with the fish in our water physics in order to pass obstacles.

Legless chimpanzee

Some climbers learned an efficient (yet risky) climbing policy jumping from creepers to creepers.

Realistic creepers

We added the possibility to use non-rigid creepers. Our chimpanzee-like agent learned to swing on them.

Drowning walker

Walkers die after more than 600 consecutive steps underwater (their head becomes red and the actions no longer have effect).

Suffocating swimmer

Similarly, swimmers die after 600 consecutive steps out of water. Hence they have to hurry to reach water.

Drowning climber

Climbers drown the same way walkers do.

Amphibious walker

We created an amphibious bipedal walker allowed to go both under and out of water. This resulted in a very realistic swimming policy.

Systematic study of ACL methods

Skill-specific comparison

We leverage the Stump Tracks environment to create experiments designed to assess teachers' performance in one of the ACL challenges we identified.

We use our Random teacher as baseline (results of the teachers are calculated as a ratio of its performance).

This study introduces a skill-specific comparison between ACL methods, allowing one to better understand the strengths and weaknesses of each teacher algorithm. We hope such a comparison will both ease the selection of teacher methods for engineers and researchers and also act as a basis to help designing new ACL algorithms and thoroughly assessing their performance.

We provide visualizations to help understanding the curricula generated here.

Global performance assessment

Using the challenging Parkour environment along with three different embodiments (bipedal walker, fish and chimpanzee), we allow to assess the global performance of an ACL method in a learning problem gathering most of the challenges studied in the challenge-specific comparison. For now, our parkour track problem largely remains open, hoping for better results with future ACL algorithms.

We provide visualizations to help understanding the curricula generated here.