Summary
Cellular automata (CAs) generate rich, lifelike dynamics but are often highly sensitive to initial conditions. Most AI-assisted discovery methods for exploring complexity in CAs have operated in an open-loop manner—setting only the initial state—which tends to yield brittle phenomena. We motivate a shift towards closed-loop discovery, where algorithms can intervene during system evolution to stabilize and guide dynamics. To this end, we develop a framework based on goal-conditioned reinforcement learning agents that can guide CA evolution through a series of minimal interventions, such as adding or removing cell values inbetween CA updates.
We evaluate our approach in two environments built on Lenia, a continuous extension of the Game of Life: (1) steering orbiums (self-organizing moving patterns akin to gliders in Game of Life) towards target directions, and (2) maintaining the state population (sum of the CA grid) at a given target value. The trained agents learn successful policies in both environments, and have the ability to utilize the intrinsic dynamics of the environment to their advantage. Furthermore, the agents discovered self-sustaining patterns that were not present in initial training data, and their policies transfer well to noisy environments. Together, these results extend AI-assisted discovery beyond initial-condition search, with potential real-world applications in discovery and control of "agential materials".
Method

We frame open-loop discovery as a reinforcement learning problem, where each step of an episode alternates between the agent’s intervention and CA updates. At each step, the agent observes both the grid state and a target goal, then chooses an action: it may modify a small number of cells by adding or removing values around a specific location, or it may opt to take no action. Afterward, the CA rules are applied to evolve the system. In some experiments, actions carry an explicit cost, encouraging the agent to balance intervention with restraint. This design promotes reliance on the CA’s intrinsic dynamics rather than excessive external control.