Summary

Cellular automata (CAs) generate rich, lifelike dynamics but are often highly sensitive to initial conditions. Most AI-assisted discovery methods for exploring complexity in CAs have operated in an open-loop manner—setting only the initial state—which tends to yield brittle phenomena. We motivate a shift towards closed-loop discovery, where algorithms can intervene during system evolution to stabilize and guide dynamics. To this end, we develop a framework based on goal-conditioned reinforcement learning agents that can guide CA evolution through a series of minimal interventions, such as adding or removing cell values inbetween CA updates.

We evaluate our approach in two environments built on Lenia, a continuous extension of the Game of Life: (1) steering orbiums (self-organizing moving patterns akin to gliders in Game of Life) towards target directions, and (2) maintaining the state population (sum of the CA grid) at a given target value. The trained agents learn successful policies in both environments, and have the ability to utilize the intrinsic dynamics of the environment to their advantage. Furthermore, the agents discovered self-sustaining patterns that were not present in initial training data, and their policies transfer well to noisy environments. Together, these results extend AI-assisted discovery beyond initial-condition search, with potential real-world applications in discovery and control of "agential materials".

Method

Architecture of the goal-conditioned RL system

We frame open-loop discovery as a reinforcement learning problem, where each step of an episode alternates between the agent’s intervention and CA updates. At each step, the agent observes both the grid state and a target goal, then chooses an action: it may modify a small number of cells by adding or removing values around a specific location, or it may opt to take no action. Afterward, the CA rules are applied to evolve the system. In some experiments, actions carry an explicit cost, encouraging the agent to balance intervention with restraint. This design promotes reliance on the CA’s intrinsic dynamics rather than excessive external control.

Steering Orbiums

We train agents to direct orbiums towards a target direction. The glider's initial position, direction, and the target direction are randomly sampled for each episode.

The legend below explains what you'll see in the videos—the X markers and arrows indicate the agent's actions and current progress, respectively.

current direction target direction positive intervention (adding a positive value to the cells) negative intervention (adding a negative value to the cells)

Training with no action cost. The agent achieves the goal by constantly perturbing the state, breaking consistency of the glider.

Training with action costs. Compared to the previous experiment, the agent acts less frequently, and eventually constructs a correctly oriented glider.

Adaptive Control in Novel, Noisy Environments

Agents trained to steer orbiums generalize to environments with random perturbations, completing the task despite never encountering such conditions during training.

Maintaining and steering an orbium in a noisy environment.

Resulting dynamics in a noisy environment when the agent's actions are muted. The random perturbations quickly drive the system into an exploding, Turing pattern.

Controling the State Population

The examples below an environment in which the agents are tasked to maintain the population size (sum of the CA grid) at a given target value. The initial state and the target population size are randomly sampled for each episode.

Evolution without Intervention

Without an agent guiding the system, it is typically evolves into a Turing pattern or an empty grid, as shown below.

Evolution with Interventions

The agents learn to maintain the population size for a range of target values and initial conditions

Behavior when the target population size is high.

Behavior with an intermediate target population size.

Discovery of Self-Sustaining Patterns

Interestingly, the agents learn to construct self-sustaining patterns that have the desired size. These patterns were not present in the initial states of training environments, and have been fully discovered by the agents while interacting with the environment. These promising results show that the proposed framework could be used to both explore and control CA and other complex systems with rich, unknown dynamics.

The agent constructs a static ring.

The agent constructs a dynamic rotating pattern.