TeachMyAgent is a testbed platform for Automatic Curriculum Learning (ACL) methods in Deep Reinforcement Learning (Deep RL). We leverageprocedurally generated environments to assess the performance of teacher algorithms in continuous task spaces. We provide tools for systematic study and comparison of ACL algorithms using both skill-specific and global performance assessment.
We release our platform as an open-source repository along with APIs allowing one to extend our testbed. We currently provide the following elements:
We introduce new embodiments for our Deep RL agents to interact with environments. We create three types of morphology:
Additionally, we introduce new physics (water, graspable surfaces) in our environments allowing to make each type of embodiment have a preferred "milieu" (e.g. walkers cannot survive underwater). These new embodiments along with our new physics create agents requiring novel locomotion such as climbing or swimming. We give a detailed presentation of our morphologies on this page, as well as some policies we managed to learn using a Deep RL learner.
We introduce a new parkour track environment which uses a CPPN-encoded procedural generation. Using this, we present a highly uneven terrain composed of a ground and a ceiling. We add water controlled through a level parameter as well as creepers along with our new physics (simulation of some water physics and introduction of possible grasping actions on creepers and the ground). The generation of tasks is controlled with a 6D vector.
The difficulty of the terrain along with the characteristics of our embodiments (e.g. swimmers die out of water) create unfeasible regions in the Parkour's task space depending on the morhpology used. This makes our parkour track a challenging learning problem requiring an efficient ACL method as most of the tasks are unfeasible for all our embodiments.
Such a hard environment requires strong locomotion skills to go over tasks. We show below some of the behaviours that emerged in the Deep RL policies we trained.
The roughness of our CPPN-generated terrain forces the Deep RL student to learn robust policies.
Agents can learn realistic swimming policies with the fish in our water physics in order to pass obstacles.
Some climbers learned an efficient (yet risky) climbing policy jumping from creepers to creepers.
We added the possibility to use non-rigid creepers. Our chimpanzee-like agent learned to swing on them.
Walkers die after more than 600 consecutive steps underwater (their head becomes red and the actions no longer have effect).
Similarly, swimmers die after 600 consecutive steps out of water. Hence they have to hurry to reach water.
Climbers drown the same way walkers do.
We created an amphibious bipedal walker allowed to go both under and out of water. This resulted in a very realistic swimming policy.
We leverage the Stump Tracks environment to create experiments designed to assess teachers' performance in one of the ACL challenges we identified.
We use our Random teacher as baseline (results of the teachers are calculated as a ratio of its performance).
This study introduces a skill-specific comparison between ACL methods, allowing one to better understand the strengths and weaknesses of each teacher algorithm. We hope such a comparison will both ease the selection of teacher methods for engineers and researchers and also act as a basis to help designing new ACL algorithms and thoroughly assessing their performance.
We provide visualizations to help understanding the curricula generated here.
Using the challenging Parkour environment along with three different embodiments (bipedal walker, fish and chimpanzee), we allow to assess the global performance of an ACL method in a learning problem gathering most of the challenges studied in the challenge-specific comparison. For now, our parkour track problem largely remains open, hoping for better results with future ACL algorithms.
We provide visualizations to help understanding the curricula generated here.