'Can machines think?' When Turing posed this question in his seminal paper Computing machinery and intelligence in the '50s, he inadvertently laid the ground for what was to become the field of Artificial Intelligence (AI). The ground was set firmly: generations of researchers have been searching for the cognitive architectures and optimization objectives that, when evaluated on carefully selected benchmarks, will lead to agents whose behavior is similar, or even improves upon, human behavior. Chess, video games and Go, are the carrots on a stick that have been driving the AI community forward.

We call this the cognition-centric approach and want to contrast it to the ecological approach. Under this alternative attitude towards creating AI, intelligence is viewed as an emergent product of adaptive systems interacting with their environments. While cognition-centric approaches attempt to reverse-engineer intelligent behavior by searching in the space of cognitive functions, ecological approaches search in the space of environmental properties to reverse-engineer the conditions that drive intelligent behavior.

Ecological perspectives abide in the study of biological organisms. From the emergence of vision systems to that of religion , environments encountered in the evolutionary trajectory of a species can help us understand why certain functions persist over others. If we look at the evolution of our own species, we will find that many behaviors that we consider simple, like using language and tools, took the longest time to evolve. Once a new skill was acquired, it also acted as a driver and enabler for new skills. Our ability to continuously acquire new skills by learning through others led to an explosively large set of behaviors: our cultural repertoire. It is this cultural repertoire that makes human societies seemingly open-ended and qualitatively different from the societies of other species, albeit complex .

The cognition-centric approach has recently given us many successful algorithms and applications. But it may have kept us busy on a research agenda that does not lead to artificial intelligence with the distinctive characteristics of a natural one. Machine learning algorithms are today criticised for being data-hungry, brittle when encountering new problems and morally disconnected from the human society that gave birth to them.

In this blog post, we will discuss how we could pursue an ecological approach to AI. Our first objective is to discover the similarities between studies of natural and artificial intelligence. Our second objective is to see how leveraging these similarities can be useful in practice for AI studies. In particular we:

A conceptual framework for linking human and artificial ecologies

We recently proposed that a possible pathway to getting artificial agents that exhibit the ability to continuously acquire skills in an open-ended way is to find inspiration in a biological species with an impressively open-ended behavioral repertoire: our own.

Environmental complexity, and its evolution, has a key role in skill acquisition. It bootstraps the emergence of skills at both an individual and a collective level. At an individual level, it modulates reproduction pressures and creates the need for cognitive mechanisms. At a collective level, cooperation and competition pressures drive the need for social skills. The skills themselves then act as drivers for further skill acquisition in two ways: a) they interact with each other to give rise to the uniquely complex human cultural repertoire b) they modulate environmental complexity through the process of niche construction. It is this latter process that makes skill acquisition open-ended, as it creates a positive feedback loop that continuously complexifies the environment.

ORIGINS can be a useful tool for both the AI and HBE communities: AI researchers can borrow existing hypotheses from HBE about which environmental conditions affect which skills to appropriately shape their environments. At the same time, HBE researchers can use AI as a computational tool for studying their hypotheses.

In the rest of the post we describe two of our projects that can be seen as following the ecological approach we are proposing. We will zoom-in into different parts of the conceptual framework, discuss the ecological hypotheses they were inspired from, and explain the computational models we designed to study them.

Evolution of adaptability in complex environments

_{Earth: the apparent diversity of species in our planet could not have existed without an equally impressive diversity in environmental conditions.}

What makes us human? For HBE researchers, this is not a philosophical inquiry, but a rather pressing scientific question: why did the first hominin species appear about 5 million years ago and which skills did they evolve that allowed them to expand into a population that today holds a powerful position in the Earth's ecosystem?

When we trace back the history of our planet, we find climatic records of very ''busy" periods : many rapid and large-amplitude climatic cycles have taken place in the last few million years, leading to high levels of climate change that must have significantly shaped populations at a global and local scale. Our own species appeared and dispersed during such periods

Could there be a link between this climatic complexity and the birth of one of the most generalist species? Understanding the mechanisms with which temporal and spatial diversity in an environment influences individuals and populations is a crucial step towards answering this question.

A similar story has been unfolding in AI, where the first impressive agents were extremely specialized, solving a single task like Chess. But recently, the focus is on generalist agents that can adapt to changes in tasks that they have never encountered during their design.

The consensus of the community is that, when it comes to creating generalist agents, the environments used for training play a big part. In meta-learning , the focus is on ensuring a wide diversity of environments. In curriculum learning the focus is on the order in which environments are encountered, as the training procedure needs to ensure that agents are continuously challenged but are able to improve.

_{Benchmarks used to evaluate agents are becoming increasingly complex and diverse: chess and Go
require strategic reasoning in a large-dimensional space, Atari games
pose diverse challenges such as partial observability, sparse rewards and were one of the first benchmarks where agents
operated solely based on pixel values.
Multi-agent environments initially tested for a single skill, such as cooperation in foraging tasks .
XLand is a vast world of single-agent and multi-agent tasks that require navigation and object manipulation in 3D space.
Minecraft comes very close to our need for open-ended environments, as the agent can craft items and
continuously complexify the environment.}

These are the ideas that motivated the computational study we describe next , where we attempt to understand how the complexity of environments, both in terms of their temporal dynamics and task diversity, acts as a driver for generalism.

An eco-evo-devo study of the emergence of generalist agents

_{An agent with low plasticity (on the left) has small σ and a high peak at their preferred niche, while a
plastic individual (on the right) has large σ and a lower peak at their preferred niche. Fitness in a
certain environmental state is computed as the probability density function of the distribution at that point
the plastic individual has lower fitness (cost of plasticity). If the actual environmental state differs
significantly from the preferred one the plastic individual has higher fitness.}

The model

In this project we designed an environment that exhibits both temporal and spatial diversity in terms of resource availability. The amount of resources depends on the latitude of a niche and a climate function that changes with time. Agents have three ways to adapt to their environment, all encoded in their genome: a preferred environmental state, phenotypic plasticity and a mutation rate. Thus, agents can adapt at two time-scales: phenotypic plasticity is a developmental mechanism that enables survival in diverse environments within a single lifetime, while mutations enable adaptation at a slower, evolutionary time-scale.

To model phenotypic plasticity we have adopted tolerance curves, a tool originally developed in ecology . Tolerance curves have the form of a Gaussian whose mean corresponds to the preferred environmental state of an individual and variance to its plasticity, i.e., its ability to survive under different environmental conditions.

Tolerance curves elegantly capture the cost and benefit of plasticity: if a plastic and non-plastic agent compete in an environment that is identical to their preferred niche, the plastic one will lose as its peak is lower. But if, for some reason, the environment changes in the next generation so that it differs significantly from the preferred niche of the two individuals, the plastic one will be at an advantage.

At each generation, an agent surviving in a niche can reproduce until the capacity of the niche is filled. Agents are chosen based on their fitness, which depends on their preferred state, their plasticity and the state of the niche. If an agent can survive in multiple niches it will be eligible for reproduction in all of them, and thus, will have higher chances of reproducing. We refer to this selection mechanism as niche-limited competition . We also compared against a condition where selection does not happen independently in each niche, but agents compete against everyone based on their average fitness across all niches. This condition corresponds to survival-of-the-fittest, the most widely used evolutionary algorithm.

Results

Here is an illustration of how the population behaves in a world with 100 niches and a climate function that has the form of a sinusoid when the selection mechanism is niche-limited competition:

_{Illustration of a simulation in our environment. The population evolves under niche-limited competition in an environment where south niches have higher
quality than norther niches.}

_{Visualization of the environment we used to study foraging behaviors at scale .
Resources are in green and their regeneration frequency is higher in the south, while agents are in black. (See this video for a visualization
of the whole duration of training.)}

In this study (that you can read more about in our paper ) we saw that adaptability emerges when the number of niches is large or when there is temporal variability. We also saw that some environmental conditions and selection pressures may favor plasticity, while others favor evolvability. We also saw that limiting competition within niches is an important driver of generalism, at a time when most meta RL works use survival-of-the-fittest.

If there is one take-away message that we would like to distill from this study for the AI community that is that there is no such thing as a generalist agent. Discussions of the generality of an agent need to consider its ecological niche. This niche is necessarily bounded because, to occupy it, an agent needs to spend time and energy. What is more, an agent will be only as generalist as its personal history and environment require. Even if we design an agent with powerful cognitive mechanisms, such as a large artificial neural network, generalisation will not emerge unless the spatiotemporal dynamics of the environment require it.

Do our conclusions generalize to meta-RL agents? Transferring our computational study to a grounded environment, where plasticity is not hard-coded but emerges out of a behavioral policy is an important next step towards showcasing the importance of ecological dynamics. For this reason we implemented a large-scale environment, where populations of thousands of agents can forage resources in a grid-world with resource density that varies in space and time .

We now zoom-out of the link between environmental dynamics and adaptability in our conceptual framework and move our gaze to the right of the framework schematic above. There we meet the cultural repertoire, a set of rather advanced skills that presuppose the existence of both individual cognitive mechanisms and social dynamics. Next, we will describe a study of how multi-agent dynamics influence the formation of a cultural repertoire.

The effect of social connectivity on collective innovation

_{Learning through and alongside others is a behavior encountered in many individuals, including artificial ones.
What role do social dynamics like group connectivity play and can we find similarities between biological and artificial social learning?}

Culture, the ability to create and spread traditions through social learning, is often seen as a monopoly of our own species. But there are many species that learn through others . Few of them can change their cultural skills with time . Some species can even accumulate changes with time, which leads to a continuous “complexification” of their skills . What is unique in humans, however, is the intensity of accumulation. From programming languages to musical instruments , the fossil record of human innovations has a rather intricate, tree structure, with new innovations arising out of recombination of existing ones.

_{Social networks characterizing human collective innovation often have a clustered structure.
On the top, reconstructed regional networks of cultural artifacts in Africa about 350 thousand years ago .
On the bottom, the citation network of a recent research paper (Image credit connected papers ).
Does this pattern suggest a link between social connectivity and the ability to innovate collectively?}

Why do humans innovate to such an unprecendented degree? Some theories point to our increased cognitive capacity or sociality . But others point to the benefits of our social connectivity: human societies often self-organize into small-world, hierarchical or dynamic topologies. The common feature underlying those is their ability to protect information within sub-parts of the group. In this way, the group maintains its cultural diversity, and it is this diversity that may be driving the continuous appearance of innovations.

Reinforcement learning, the sub-field of AI concerned with learning by interacting with an environment, is also interested in the benefits of collective exploration. In distributed RL, multiple agents solve a task in parallel and exchange information on the way

. This has shown benefits both in terms of the quality of the final solution and the speed at which it is discovered.

Yet this community has considered a single type of group connectivity, the star topology: all agents interact with the environment to collect information and then share it with a single central node responsible for processing the new information and updating the behavior of the agents. Thus, artificial groups look very different from the partially-connected human groups. If this structure reduces their collective diversity, it may negatively impact the ability of these groups to solvex innovation challenges.

This is the idea that motivated the computational study we describe next , where groups of RL agents solve innovation tasks under different topologies.

SAPIENS: a distributed RL framework where social connectivity matters

The model

To study innovation we employed Wordcraft , an RL text-world inspired from the Little Alchemy 2 game. The player starts with a set of elements, for example "fire", "water" and "earth" and can combine them in pairs to craft new elements. Not all combinations give rise to new elements; the description of a task includes a list of the possible combinations.

We were interested in studying a variety of challenges a group can encounter when solving innovation tasks, so we designed three different tasks:

In our experiments, a game is played by a group of RL agents. An RL agent embodies the paradigm of learning through trial and error. It interacts with an environment by executing actions and receiving observations and rewards. For example, in a Little Alchemy 2 task, an agent defines which elements it will combine and the environment returns the newly created elements and the points added to the player's score. The main idea in RL is that an agent can learn a policy that solves a task optimally if it is exposed to multiple trials of it and uses this experience to maximize the rewards it receives.

The RL algorithm that we employed was DQN , a deep RL algorithm where an agent has an explicit memory of past experiences that is periodically sampled to update its policy. These experiences have the form [observation, action, reward] and are also employed for sharing information with others. When we say that an agent shares an experience with another agent we mean that it samples a random experience from its memory and directly inserts it to the other's memory.

An agent shares experiences only with its neighbors, which are determined by the social connectivity of the group. In our study, we considered the following social connectivities:

Results

Our empirical evaluation of the four types of social connectivitues showed that fully-connected groups perform worse in all three tasks. In the single-path task, they are the slowest to find the optimal solution. In the merging-paths task, they converge to the local optima. In the best-of-ten paths, they explore too slowly to discover the optimal path.

To explain these behaviors we measured the diversity of memories that agents have. We observed that agents in fully-connected groups had high diversity as individuals, but the group as a whole had the lowest diversity. Low diversity indicates that the group is collectively exploring a small party of the search space and can thus explain the failure of fully-connected groups.

The animation below illustrates how a group with partial connectivity and full connectivity behave in the merging-paths task. As we see, agents in fully-connected group end up all exploring the same branch and fail to discover the global optimum:

For more detailed results, including performances of the different baselines on all tasks, take a look at our paper , written in collaboration with Ida Mommenejad from Microsoft Research.

Do our empirical conclusions generalize to more complex environments, such as grid-worlds and Atari games? And how can we use our new understanding to guide the design of social connectivity for improving performance in certain tasks? These are examples of the research questions we plan to explore in future work.

Connecting the dots

With these two works we showed how our research methodology can benefit two very different fields: we started from hypotheses in ecology, designed computational studies inspired from them and derived empirical conclusions that can feed future studies both in ecology and AI.

At a first glance our two studies may seem disconnected. One is concerned with the effect of ecological dynamics on a populations' dispersal and adaptability and the other with the effect of social connectivity on collective innovation. But isn't the social connectivity of a population related to its dispersal? This observation allows us to draw a link between ecological dynamics and collective innovation. Can certain ecological conditions favor certain social connectivities and what would this mean for the evolution of our species?

This connection may not be far-fetched. Ecological studies posit that elements in our environment such as the Sahara desert act as ecological barriers that prohibit immigration but can disappear during periods of extreme climatic variability ( did you know the Sahara was green a few tens of thousands of years ago?) . And studies of human cultural innovation posit that such barriers played an important part in the spread of cultural artifacts .

_{Would simulation environments with RL agents navigating a world where diverse continents are separated
by oceans lead to artificial societies that remind us of our own?}

An ecological approach to Artificial Intelligence

How the field of ecology helped us conceptualize skill acquisition and design studies in AI

A conceptual framework for linking human and artificial ecologies

Evolution of adaptability in complex environments

An eco-evo-devo study of the emergence of generalist agents

The model

Results

The effect of social connectivity on collective innovation

SAPIENS: a distributed RL framework where social connectivity matters

The model

Results

Connecting the dots