An ecological approach to Artificial Intelligence
How the field of ecology helped us conceptualize skill acquisition and design studies in AI
'Can machines think?' When Turing posed this question in his seminal paper
Computing machinery and intelligence
in the '50s, he inadvertently laid the ground for what was to
become the field of Artificial Intelligence (AI). The ground was set firmly: generations of researchers have been
searching for the cognitive architectures and optimization objectives that, when evaluated on carefully selected benchmarks, will lead to agents
whose behavior is similar, or even improves upon, human behavior. Chess, video games and Go, are the carrots on a stick that
have been driving the AI community forward.
We call this the cognition-centric approach and want to contrast it to the ecological
approach. Under this alternative attitude towards creating AI, intelligence is viewed as an emergent product
of adaptive systems interacting with their environments. While cognition-centric approaches attempt to reverse-engineer
intelligent behavior by searching in the space of cognitive functions, ecological approaches search in the space of
environmental properties to reverse-engineer the conditions that drive intelligent behavior.
Ecological perspectives abide in the study of biological organisms.
From the emergence of vision systems
to that of religion , environments encountered in the
evolutionary trajectory of a species can help us understand why certain functions persist over others.
If we look at the evolution of our own species, we will find that many behaviors that we consider simple,
like using language and tools, took the longest time to evolve.
Once a new skill was acquired, it also acted as a driver and enabler for new skills.
Our ability to continuously acquire new skills by learning through others led to an explosively large set of behaviors: our cultural repertoire.
It is this cultural repertoire that makes human societies seemingly open-ended and qualitatively different from the societies of other species, albeit complex .
The cognition-centric approach has recently given us many successful algorithms and applications.
But it may have kept us busy on a research agenda that does not lead to artificial intelligence with the distinctive
characteristics of a natural one.
Machine learning algorithms are today criticised for being data-hungry, brittle when encountering new problems
and morally disconnected from the human society that gave birth to them.
In this blog post, we will discuss how we could pursue an ecological approach to AI.
Our first objective is to discover the similarities between studies of natural and artificial intelligence.
Our second objective is to see how leveraging these similarities can be useful in practice for AI studies.
In particular we:
- present a conceptual framework for grounding an ecological approach to AI in Human
Behavioral Ecology (HBE), a research field studying the evolution of humans in interaction with their environments
- illustrate how an ecological perspective to AI looks in practise through two of our recent
- the study of the emergence of adaptability in environments with temporal and spatial variability using a simple an eco-evolutionary model
- the study of the effect of social connectivity on collective innovation with distributed groups of reinforcement learning agents
- Discuss promising perspectives on how these two apparently disconnected contributions can shed light on the complex interactions between ecological and socio-cultural dynamics
A conceptual framework for linking human and artificial ecologies
We recently proposed that a possible pathway
to getting artificial agents that exhibit the ability to continuously acquire skills in an open-ended way is to find inspiration in a biological
species with an impressively open-ended behavioral repertoire: our own.
Our methodology was three-step:
- skim the Human Behavioral Ecology (HBE) literature to identify the key factors that this
community proposes as conducive for the evolution of the uniquely diverse human cultural repertoire
- identify similar research questions and hypotheses in AI, in particular the sub-fields of
meta and multi-agent reinforcement learning
- abstract away the particularities of the two fields to arrive at a conceptual framework that
unifies their terminologies and research agendas.
Here is our conceptual framework, ORIGINS , that our work led to:
ORIGINS is a conceptual framework for studying skill acquisition in both human and artificial ecologies
The story that the framework tells, put succinctly, is:
Environmental complexity, and its evolution, has a key role in skill acquisition.
It bootstraps the emergence of skills at both an individual and a collective level.
At an individual level, it modulates reproduction pressures and creates the need for cognitive mechanisms.
At a collective level, cooperation and competition pressures drive the need for social skills.
The skills themselves then act as drivers for further skill acquisition in two ways:
a) they interact with each other to give rise to the uniquely complex human cultural repertoire b) they modulate
environmental complexity through the process of niche construction.
It is this latter process that makes skill acquisition open-ended, as it creates a positive feedback loop that
continuously complexifies the environment.
ORIGINS can be a useful tool for both the AI and HBE communities: AI researchers can borrow existing hypotheses from
HBE about which environmental conditions affect which skills to appropriately shape their environments. At the same time, HBE researchers can use AI as a computational tool
for studying their hypotheses.
In the rest of the post we describe two of our projects that can be seen as following the ecological approach we are proposing.
We will zoom-in into different parts of the conceptual framework, discuss the ecological hypotheses they were inspired from,
and explain the computational models we designed to study them.
Evolution of adaptability in complex environments
In this study we focus on the effect of environmental complexity on individual adaptation
Earth: the apparent diversity of species in our planet could not have existed without an equally impressive diversity in environmental conditions.
What makes us human? For HBE researchers, this is not a philosophical inquiry, but a rather pressing scientific
question: why did the first hominin species appear about 5 million years ago and which skills did they evolve
that allowed them to expand into a population that today holds a powerful position in the Earth's ecosystem?
When we trace back the history of our planet, we find climatic records of very ''busy" periods : many rapid and
large-amplitude climatic cycles have taken place in the last few million years, leading to high levels of climate
change that must have significantly shaped populations at a global and local scale. Our own species appeared
and dispersed during such periods
Could there be a link between this climatic complexity and the birth of one of the most generalist species?
Understanding the mechanisms with which temporal and spatial diversity in an environment influences individuals and populations
is a crucial step towards answering this question.
A similar story has been unfolding in AI, where the first impressive agents were extremely specialized, solving a single task like Chess.
But recently, the focus is on generalist agents that can adapt to changes in tasks that they have never encountered during their design.
The consensus of the community is that, when it comes to creating generalist agents,
the environments used for training play a big part.
In meta-learning , the focus is on ensuring a wide diversity of environments.
In curriculum learning the focus is on the order in which environments are encountered,
as the training procedure needs to ensure that agents are continuously challenged but are able to improve.
Benchmarks used to evaluate agents are becoming increasingly complex and diverse: chess and Go
require strategic reasoning in a large-dimensional space, Atari games
pose diverse challenges such as partial observability, sparse rewards and were one of the first benchmarks where agents
operated solely based on pixel values.
Multi-agent environments initially tested for a single skill, such as cooperation in foraging tasks .
XLand is a vast world of single-agent and multi-agent tasks that require navigation and object manipulation in 3D space.
Minecraft comes very close to our need for open-ended environments, as the agent can craft items and
continuously complexify the environment.
These are the ideas that motivated the computational study we describe next ,
where we attempt to understand how the complexity of environments,
both in terms of their temporal dynamics and task diversity, acts as a driver for generalism.
An eco-evo-devo study of the emergence of generalist agents
An agent with low plasticity (on the left) has small σ and a high peak at their preferred niche, while a
plastic individual (on the right) has large σ and a lower peak at their preferred niche. Fitness in a
certain environmental state is computed as the probability density function of the distribution at that point
the plastic individual has lower fitness (cost of plasticity). If the actual environmental state differs
significantly from the preferred one the plastic individual has higher fitness.
In this project we designed an environment that exhibits both temporal and spatial diversity in terms of resource availability.
The amount of resources depends on the latitude of a niche and a climate function that changes with time.
Agents have three ways to adapt to their environment, all encoded in their genome: a preferred environmental state,
phenotypic plasticity and a mutation rate.
Thus, agents can adapt at two time-scales: phenotypic plasticity is a developmental mechanism that enables survival
in diverse environments within
a single lifetime, while mutations enable adaptation at a slower, evolutionary time-scale.
To model phenotypic plasticity we have adopted tolerance curves, a tool originally developed in ecology .
Tolerance curves have the form of a Gaussian whose mean corresponds to the preferred environmental state of an individual and
variance to its plasticity, i.e., its ability to survive under different environmental conditions.
Tolerance curves elegantly capture the cost and benefit of plasticity: if a plastic and non-plastic agent compete
in an environment that is identical to their preferred niche, the plastic one will lose as its peak is lower. But
if, for some reason, the environment changes in the next generation so that it differs significantly from the preferred niche
of the two individuals, the plastic one will be at an advantage.
At each generation, an agent surviving in a niche can reproduce until the capacity of the niche is filled.
Agents are chosen based on their fitness, which depends on their preferred state, their plasticity and the state of the niche.
If an agent can survive in multiple niches it will be eligible for reproduction in all of them, and thus, will have higher chances of reproducing.
We refer to this selection mechanism as niche-limited competition .
We also compared against a condition where selection does not happen independently in each niche, but agents compete against everyone based on their average
fitness across all niches.
This condition corresponds to survival-of-the-fittest, the most widely used evolutionary algorithm.
Here is an illustration of how the population behaves in a world with 100 niches and a climate function that has the
form of a sinusoid when the selection mechanism is niche-limited competition:
Illustration of a simulation in our environment. The population evolves under niche-limited competition in an environment where south niches have higher
quality than norther niches.
Visualization of the environment we used to study foraging behaviors at scale .
Resources are in green and their regeneration frequency is higher in the south, while agents are in black. (See this video for a visualization
of the whole duration of training.)
In this study (that you can read more about in our paper ) we saw that adaptability emerges when the number of niches is large or when there is temporal
We also saw that some environmental conditions and selection pressures may favor plasticity, while others favor
We also saw that limiting competition within niches is an important driver of generalism, at a time when most meta RL works use survival-of-the-fittest.
If there is one take-away message that we would like to distill from this study for the AI community that is that there is no such thing as a generalist agent.
Discussions of the generality of an agent need to consider its ecological niche.
This niche is necessarily bounded because, to occupy it, an agent needs to spend time and energy.
What is more, an agent will be only as generalist as its personal history and environment require.
Even if we design an agent with powerful cognitive mechanisms, such as a large artificial neural network, generalisation will not emerge
unless the spatiotemporal dynamics of the environment require it.
Do our conclusions generalize to meta-RL agents?
Transferring our computational study to a grounded environment, where plasticity
is not hard-coded but emerges out of a behavioral policy is an important next step towards showcasing the importance of
For this reason we implemented a large-scale environment, where populations of thousands of agents can forage resources
in a grid-world with resource density that varies in space and time .
We now zoom-out of the link between environmental dynamics and adaptability in our conceptual framework and move our gaze to the right of the framework schematic above.
There we meet the cultural repertoire, a set of rather advanced skills that presuppose the existence of both individual cognitive mechanisms and social dynamics.
Next, we will describe a study of how multi-agent dynamics influence the formation of a cultural repertoire.
The effect of social connectivity on collective innovation
In this study we focus on the effect of social dynamics on the collective innovation of learning agents
Learning through and alongside others is a behavior encountered in many individuals, including artificial ones.
What role do social dynamics like group connectivity play and can we find similarities between biological and artificial social learning?
Culture, the ability to create and spread traditions through social learning, is often seen as a monopoly of our own species.
But there are many species that learn through others .
Few of them can change their cultural skills with time .
Some species can even accumulate changes with time, which leads to a continuous “complexification” of their skills .
What is unique in humans, however, is the intensity of accumulation.
From programming languages to musical instruments , the fossil record of human innovations has a rather intricate, tree structure, with new innovations arising out
of recombination of existing ones.
Social networks characterizing human collective innovation often have a clustered structure.
On the top, reconstructed regional networks of cultural artifacts in Africa about 350 thousand years ago .
On the bottom, the citation network of a recent research paper (Image credit connected papers ).
Does this pattern suggest a link between social connectivity and the ability to innovate collectively?
Why do humans innovate to such an unprecendented degree?
Some theories point to our increased cognitive capacity or sociality .
But others point to the benefits of our social connectivity: human societies often self-organize into small-world, hierarchical or dynamic
topologies. The common feature underlying those is their ability to protect information within sub-parts of the group.
In this way, the group maintains its cultural diversity, and it is this diversity that may be driving the continuous appearance of innovations.
Reinforcement learning, the sub-field of AI concerned with learning by interacting with an environment,
is also interested in the benefits of collective exploration.
In distributed RL, multiple agents solve a task in parallel and exchange information on the way
This has shown benefits both in terms of the quality of the final solution and the speed at which it is discovered.
Yet this community has considered a single type of group connectivity, the star topology:
all agents interact with the environment to collect information and then share it with a single central node responsible for
processing the new information and updating the behavior of the agents.
Thus, artificial groups look very different from the partially-connected human groups.
If this structure reduces their collective diversity, it may negatively impact the ability of these groups to solvex innovation challenges.
This is the idea that motivated the computational study we describe next ,
where groups of RL agents solve innovation tasks under different topologies.
SAPIENS: a distributed RL framework where social connectivity matters
To study innovation we employed Wordcraft ,
an RL text-world inspired from the Little Alchemy 2 game.
The player starts with a set of elements, for example "fire", "water" and "earth" and can combine them in pairs to craft new elements.
Not all combinations give rise to new elements; the description of a task includes a list of the possible combinations.
Our environment is a text-based version of the Little Alchemy 2 game, where a player combines elements to form new elements.
Visualization of our innovation tasks
We were interested in studying a variety of challenges a group can encounter when solving innovation tasks, so we designed
three different tasks:
- the single-path task, where new innovations arise as modifications of the most recent past innovations.
Search is easy in this type of task.
The challenge lies in solving it as quickly as possible.
We are therefore interested in the speed at which groups innovate.
This task can be used to model arms races, such as the invention of aircrafts and their continuous improvement to machine gun-mounted planes like
the Fokker during the First World War .
- the merging-paths task, where two identical paths merge into a more rewarding path.
This task contains a strong local optima: to find the more rewarding path one needs to start exploring both other paths instead of reaching the end of one of them.
It can be useful to model the recombination of innovations that often happens in technological progress, as for example in the evolution of gasoline.
The distillation of petroleum can be seen as one innovation path, that gave us the inventions of kerosine and gasoline.
Kerosine attracted a lot of attention but gasoline was ignored as a volatile by-product.
This was until another innovation path gave us the internal combustion engine.
In itself this engine was not very popular, but when combined with gasoline, it changed the world.
- the best-of-ten paths task, with nine identical paths and one path that is the most rewarding.
Here, the challenge lies in exploring a large search space.
Large search spaces are inherent in many fields of innovation, such as biology and medicine.
Visualization of two RL agents interacting with the environment and sharing experiences
In our experiments, a game is played by a group of RL agents.
An RL agent embodies the paradigm of learning through trial and error.
It interacts with an environment by executing actions and receiving observations and rewards.
For example, in a Little Alchemy 2 task, an agent defines which elements it will combine and the environment returns the newly created elements and the points added to the player's score.
The main idea in RL is that an agent can learn a policy that solves a task optimally if it is exposed to multiple trials of it and uses this experience to maximize
the rewards it receives.
The RL algorithm that we employed was DQN , a deep RL algorithm where
an agent has an explicit memory of past experiences that is periodically sampled
to update its policy.
These experiences have the form [observation, action, reward] and are also employed for sharing information with others.
When we say that an agent shares an experience with another agent we mean that it samples a random experience from its memory
and directly inserts it to the other's memory.
An agent shares experiences only with its neighbors, which are determined by the social connectivity of the group.
In our study, we considered the following social connectivities:
The social connectivities we studied empirically. The dynamic connectivity is inspired from a previous study in human
Our empirical evaluation of the four types of social connectivitues showed that fully-connected groups perform worse in all three tasks.
In the single-path task, they are the slowest to find the optimal solution.
In the merging-paths task, they converge to the local optima.
In the best-of-ten paths, they explore too slowly to discover the optimal path.
To explain these behaviors we measured the diversity of memories that agents have.
We observed that agents in fully-connected groups had high diversity as individuals, but the group as a whole had the lowest diversity.
Low diversity indicates that the group is collectively exploring a small party of the search space and can thus explain the failure of fully-connected groups.
The animation below illustrates how a group with partial connectivity and full connectivity behave in the merging-paths task.
As we see, agents in fully-connected group end up all exploring the same branch and fail to discover the global optimum:
Illustration of how a dynamic group is better at avoiding local optima compared to a fully-connected group.
For more detailed results, including performances of the different baselines on all tasks, take a look at our paper , written
in collaboration with Ida Mommenejad from Microsoft Research.
Do our empirical conclusions generalize to more complex environments, such as grid-worlds and Atari games?
And how can we use our new understanding to guide the design of social connectivity for improving performance in certain tasks?
These are examples of the research questions we plan to explore in future work.
Connecting the dots
With these two works we showed how our research methodology can benefit two very different fields: we started from
hypotheses in ecology, designed computational studies inspired from them and derived empirical conclusions that can feed
future studies both in ecology and AI.
At a first glance our two studies may seem disconnected.
One is concerned with the effect of ecological dynamics on a populations' dispersal and adaptability and the other with
the effect of social connectivity on collective innovation.
But isn't the social connectivity of a population related to its dispersal?
This observation allows us to draw a link between ecological dynamics and collective innovation.
Can certain ecological conditions favor certain social connectivities and what would this mean for the evolution of our species?
This connection may not be far-fetched. Ecological studies posit that elements in our environment such as the Sahara desert act as
ecological barriers that prohibit immigration but can disappear during periods of extreme climatic variability
( did you know the Sahara was green a few tens of thousands
of years ago?) .
And studies of human cultural innovation posit that such barriers played an important part
in the spread of cultural artifacts .
Would simulation environments with RL agents navigating a world where diverse continents are separated
by oceans lead to artificial societies that remind us of our own?