AOH :: TIERRA1.DOX The evolution and optimization of digital organisms 1/2

Please let me know if you receive this.

The enclosed manuscript is a draft, which contains more than any
published paper.  Papers that have been published or are in press are:

Ray, T. S.  1991.  Is it alive, or is it GA?''
Proceedings of the 1991 International Conference on Genetic Algorithms,
Eds. Belew, R. K., and L. B. Booker, San Mateo, CA: Morgan Kaufmann, 527--534.

Ray, T. S.  1991.  An approach to the synthesis of life.''
Artificial Life II, Santa Fe Institute Studies in the Sciences of
Complexity, vol. XI, Eds. C. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen,
Redwood City, CA: Addison-Wesley, 371--408.

Ray, T. S.  1991.  Population dynamics of digital organisms.''
Artificial Life II Video Proceedings,  Ed. C. G. Langton,
Redwood City, CA: Addison Wesley.

Ray, T. S.  1991.  Evolution and optimization of digital organisms.''
Scientific Excellence in Supercomputing: The IBM 1990 Contest Prize Papers,
Eds. Keith R. Billingsley, Ed Derohanes, Hilton Brown, III.
Athens, GA, 30602, The Baldwin Press, The University of Georgia.
Publication date: December 1991.

I don't have a paper in it, but you might be interested in the
following book:

Langton, Christopher G. [ed.].  1989.  Artificial life: proceedings of an
interdisciplinary workshop on the synthesis and simulation of living systems.
Vol. VI in the series: Santa Fe Institute studies in the sciences of
complexity.  Addison-Wesley.

I will be at the Santa Fe Institute Feb. 1 thru Aug. 31, 1992.
This work will also be presented in the following upcoming seminars:

National Science Foundation, Washington, September 20, 1991
University of Rochester, Biology, October 18, 1991
University of Illinois, Complex Systems Colloquia, October 25, 1991
University of Maryland, Zoology, October 29, 1991
University of Kentucky, Lexington, Biology, October 31, 1991
University of Delaware, Entomology, November 5, 1991
Stony Brook, Department of Ecology and Evolution, November 6, 1991
Drexel University, Electrical Engineering, November 8, 1991
The University of the Arts, Philadelphia, Design in Cyberspace lectures,
November 12, 1991
IBM, T. J. Watson Research Center, Yorktown Heights, NY, November 13, 1991
Thinking Machines Corp., Cambridge, November 14, 1991
Digital Equipment Corp., Hudson, MA, November 15, 1991
American Society of Information Science, New Jersey, November 19, 1991
Texas Instruments, Dallas, November 21, 1991
Harvard University, Biology (Lewontin's lab), December 2, 1991
Boston University, Computational Sciences Center, December 3, 1991
MIT Nanotechnology Study Group, December 3, 1991
University of Massachusetts Boston, Biology, December 5, 1991
Yale University, Biology, December 6, 1991
University of Arizona, Ecology & Evolutionary Biology, March 10, 1992
Cornell University, Mathematical Sciences Institute, CA Workshop, May 1992
Gordon Conference on Theoretical Biology, New Hampshire, June 8-12, 1992

A DOS version of the Tierra software with a decent front end will be ready
for distribution by December.  The source code is available by ftp at:
tierra.slhs.udel.edu or life.slhs.udel.edu  in a directory named tierra

The source code in the ftp site will compile and run on DOS (tested with
Turbo C) but does not have a nice front end).

There is an Artificial Life mailing list.  It is distributed as a digest
to avoid chaos.  You may subscribe to the list by sending a message to:

alife-request@cognet.ucla.edu

and you may post to the list by sending a message to:

alife@cognet.ucla.edu

Tom Ray
University of Delaware
School of Life & Health Sciences
Newark, Delaware  19716
ray@tierra.slhs.udel.edu
ray@life.slhs.udel.edu
ray@brahms.udel.edu
302-451-2281 (FAX)
302-451-2753

The figures for this paper are in available on request in Generic CADD
batch file format.  This paper is being sent in two parts.  You will have
to cut the headers of the two parts at the scissors marks, then cat them
together.

O /
================================= x -------------------------------------
o \
\documentstyle[12pt]{article}

\flushbottom
\textheight 9in
\textwidth 6.5in
\textfloatsep 30pt plus 3pt minus 6pt
\parskip 7.5pt plus 1pt minus 1pt
\oddsidemargin 0in
\evensidemargin 0in
\topmargin 0in
\headheight 0in
\headsep 0in

% Hanging Paragraph
\def\XP{\par\begingroup\parindent 0in\everypar{\hangindent .3in}}
\def\eXP{\par\endgroup}

% Left Justified Paragraph
\def\LP{\par\begingroup\parindent 0in\everypar{\hangindent 0in}}
\def\eLP{\par\endgroup}

\begin{document}
\thispagestyle{empty}

\LP
\bf Thomas S. Ray\rm \\
School of Life \& Health Sciences, University of Delaware, Newark, Delaware
19716,\\
ray@brahms.udel.edu\\
\rule[6pt]{6.5in}{1pt}
\Large \bf Evolution and Optimization of Digital Organisms\rm \normalsize\\
\rule[6pt]{6.5in}{2pt}
\eLP

Digital organisms have been synthesized based on a computer metaphor of
organic life in which CPU time is the energy'' resource and memory is
the material'' resource.  Memory is organized into informational genetic''
patterns that exploit CPU time for self-replication.  Mutation generates
new forms, and evolution proceeds by natural selection as different
genotypes'' compete for CPU time and memory space.  In addition, new
genotypes appear which exploit other creatures'' for informational or
energetic resources.

The digital organisms are self-replicating computer programs, however,
they can not escape because they run exclusively on a virtual computer
in its unique machine language.  From a single ancestral creature''
there have evolved tens of thousands of self-replicating genotypes of
hundreds of genome size classes.  Parasites evolved, then creatures that
were immune to parasites, and then parasites that could circumvent the
immunity.  Hyper-parasites evolved which subvert parasites to their own
reproduction and drive them to extinction.  The resulting genetically
uniform communities evolve sociality in the sense of creatures that can
only reproduce in cooperative aggregations, and these aggregations are
then invaded by cheating hyper-hyper-parasites.

Diverse ecological communities have emerged.  These digital communities
have been used to experimentally study ecological and evolutionary processes:
e.g., competitive exclusion and coexistance, symbiosis, host/parasite density
dependent population regulation, the effect of parasites in enhancing
community diversity, evolutionary arms races, punctuated equilibrium, and
the role of chance and historical factors in evolution.  It is possible to
extract information on any aspect of the system without disturbing it, from
phylogeny or community structure through time to the genetic makeup'' and
metabolic processes'' of individuals.  Digital life demonstrates the
power of the computational approach to science as a complement to the
traditional approaches of experiment, and theory based on analysis through
calculus and differential equations.

Optimization experiments have shown that freely evolving digital organisms
can optimize their algorithms by a factor of 5.75 in a few hours of real
time.  In addition, evolution discovered the optimization technique of
unrolling the loop''.  Evolution may provide a new method for the
optimization or generation of application programs.  This method may
prove particularly useful for programming massively parallel machines.

\LP
\rule[6pt]{6.5in}{1pt}
evolution, ecology, artificial life, synthetic life, emergence,
self-replication, diversity, adaptation, coevolution, optimization
\eLP

\newpage

% \baselineskip = 20pt

\LP
\bf Thomas S. Ray\rm \\
School of Life \& Health Sciences, University of Delaware, Newark, Delaware
19716,\\
ray@brahms.udel.edu\\
\rule[6pt]{6.5in}{1pt}
\Large \bf Evolution and Optimization of Digital Organisms\rm \normalsize\\
\rule[6pt]{6.5in}{2pt}
\vspace{6cm}

\begin{quote}
Marcel, a mechanical chessplayer... his exquisite 19th-century brainwork
--- the human art it took to build which has been flat lost, lost as the
dodo bird ...  But where inside Marcel is the midget Grandmaster, the
little Johann Allgeier?  where's the pantograph, and the magnets?  Nowhere.
Marcel really is a mechanical chessplayer.  No fakery inside to give
him any touch of humanity at all.\\
\hspace*{2in}--- Thomas Pynchon, \it Gravity's Rainbow\rm .
\end{quote}

\large \bf 1 INTRODUCTION\rm \normalsize
\eLP

Ideally, the science of biology should embrace all forms of life.  However
in practice, it has been restricted to the study of a single instance of
life, life on earth.  Life on earth is very diverse, but it is presumably
all part of a single phylogeny.  Because biology is based on a sample size
of one, we can not know what features of life are peculiar to earth, and
what features are general, characteristic of all life.  A truly comparative
natural biology would require inter-planetary travel, which is light
years away.  The ideal experimental evolutionary biology would involve
creation of multiple planetary systems, some essentially identical,
others varying by a parameter of interest, and observing them for billions
of years.

A practical alternative to an inter-planetary or mythical biology is to
create synthetic life in a computer.  The objective is not necessarily
to create life forms that would serve as models for the study of natural
life, but rather to create radically different life forms, based on a
completely different physics and chemistry, and let these life forms
evolve their own phylogeny, leading to whatever forms are natural to their
unique physical basis.  These truly independent instances of life may
then serve as a basis for comparison, to gain some insight into what is
general and what is peculiar in biology.  Those aspects of life that prove
to be general enough to occur in both natural and synthetic systems can then
be studied more easily in the synthetic system.  Evolution in a bottle''
provides a valuable tool for the experimental study of evolution and ecology.

The intent of this work is to synthesize rather than simulate life.  This
approach starts with hand crafted organisms already capable of replication
and open-ended evolution, and aims to generate increasing diversity and
complexity in a parallel to the Cambrian explosion.  To state such a goal
leads to semantic problems, because life must be defined in a way that does
not restrict it to carbon based forms.  It is unlikely that there could be
general agreement on such a definition, or even on the proposition that life
need not be carbon based.  Therefore, I will simply state my conception of
life in its most general sense.  I would consider a system to be living if
it is self-replicating, and capable of open-ended evolution.  Synthetic life
should self-replicate, and evolve structures or processes that were not
designed-in or pre-conceived by the creator (Pattee [30]; Cariani [5]).

Core Wars programs, computer viruses, and worms (Cohen [6]; Dewdney [10,
11, 13, 14]; Denning [9]; Rheingold [32]; Spafford et al. [33]) are capable
of self-replication, but fortunately, not evolution.  It is unlikely that
such programs will ever become fully living, because they are not likely
to be able to evolve.

Most evolutionary simulations are not open-ended.  Their potential is limited
by the structure of the model, which generally endows each individual with a
genome consisting of a set of pre-defined genes, each of which may exist
in a pre-defined set of allelic forms (Holland [20]; Dewdney [12]; Dawkins
[7, 8]; Packard [29]; Ackley \& Littman [1]).  The object being evolved
is generally a data structure representing the genome, which the simulator
program mutates and/or recombines, selects, and replicates according to
criteria designed into the simulator.  The data structures do not contain the
mechanism for replication, they are simply copied by the simulator if they
survive the selection phase.

Self-replication is critical to synthetic life because without it, the
mechanisms of selection must also be pre-determined by the simulator.  Such
artificial selection can never be as creative as natural selection.  The
organisms are not free to invent their own fitness functions.  Freely
evolving creatures will discover means of mutual exploitation and
associated implicit fitness functions that we would never think of.
Simulations constrained to evolve with pre-defined genes, alleles and fitness
functions are dead ended, not alive.

The approach presented here does not have such constraints.  Although the
model is limited to the evolution of creatures based on sequences of machine
instructions, this may have a potential comparable to evolution based on
sequences of organic molecules.  Sets of machine instructions similar to
those used in the Tierra Simulator have been shown to be capable of
universal computation'' (Aho et al. [2]; Minsky [26]; Langton [24]).
This suggests that evolving machine codes should be able to generate any
level of complexity.

Other examples of the synthetic approach to life can be seen in the work of
Holland [21], Farmer et al.\ [16], Langton [22], Rasmussen et al.\
[31], and Bagley et al.\ [3].  A characteristic these efforts
generally have in common is that they parallel the origin of life event by
attempting to create prebiotic conditions from which life may emerge
spontaneously and evolve in an open ended fashion.

While the origin of life is generally recognized as an event of the first
order, there is another event in the history of life that is less well known
but of comparable significance: the origin of biological diversity and
macroscopic multicellular life during the Cambrian explosion 600 million
years ago.  This event involved a riotous diversification of life forms.
Dozens of phyla appeared suddenly, many existing only fleetingly, as
diverse and sometimes bizarre ways of life were explored in a relative
ecological void (Gould [18]; Morris [27]).

The work presented here aims to parallel the second major event in the
history of life, the origin of diversity.  Rather than attempting to create
prebiotic conditions from which life may emerge, this approach involves
engineering over the early history of life to design complex evolvable
organisms, and then attempting to create the conditions that will set off
a spontaneous evolutionary process of increasing diversity and complexity
of organisms.  This work represents a first step in this direction, creating
an artificial world which may roughly parallel the RNA world of
self-replicating molecules (still falling far short of the Cambrian explosion).

The approach has generated rapidly diversifying communities of self-replicating
organisms exhibiting open-ended evolution by natural selection.  From a single
rudimentary ancestral creature containing only the code for self-replication,
interactions such as parasitism, immunity, hyper-parasitism, sociality and
cheating have emerged spontaneously.  This paper presents a methodology and
some first results.

Apart from its value as a tool for the study or teaching of ecology and
evolution, synthetic life may have commercial applications.  Evolution of
machine code provides a new approach to the design and optimization of
computer programs.  In an analogy to genetic engineering, pieces of
application code may be inserted into the genomes of digital organisms,
and then evolved to new functionality or greater efficiency.

\LP
\rule[6pt]{6.5in}{1pt}

\begin{quote}
Here was a world of simplicity and certainty...
a world based on the one and zero of life and
death.  Minimal, beautiful.  The patterns of lives and deaths....
weightless, invisible chains of electronic presence or absence.  If
patterns of ones and zeros were like'' patterns of human lives and
deaths, if everything about an individual could be represented in a
computer record by a long string of ones and zeros, then what kind of
creature would be represented by a long string of lives and deaths?
It would have to be up one level at least --- an angel, a minor god,
something in a UFO.\\
\hspace*{2in} --- Thomas Pynchon, \it Vineland\rm .
\end{quote}

\large \bf 2 METHODS\rm \normalsize

\bf 2.1 THE METAPHOR\rm
\eLP

Organic life is viewed as utilizing energy, mostly derived
from the sun, to organize matter.  By analogy, digital life can be
viewed as using CPU (central processing unit) time, to organize memory.
Organic life evolves through natural selection as individuals compete for
resources (light, food, space, etc.) such that genotypes which leave the
most descendants increase in frequency.  Digital life evolves through the
same process, as replicating algorithms compete for CPU time and memory
space, and organisms evolve strategies to exploit one another.  CPU time is
thought of as the analog of the energy resource, and memory as the analog
of the spatial resource.

The memory, the CPU and the computer's operating system are viewed as elements
of the abiotic'' (physical) environment.  A creature'' is then designed
to be specifically adapted to the features of the computational environment.
The creature consists of a self-replicating assembler language program.
Assembler languages are merely mnemonics for the machine codes that are
directly executed by the CPU.  These machine codes have the characteristic
that they directly invoke the instruction set of the CPU and services provided
by the operating system.

All programs, regardless of the language they are written in, are converted
into machine code before they are executed.  Machine code is the natural
language of the machine, and machine instructions are viewed by this
author as the atomic units'' of computing.  It is felt that machine
instructions provide the most natural basis for an artificial chemistry
of creatures designed to live in the computer.

In the biological analogy, the machine instructions are considered to be
more like the amino acids than the nucleic acids, because they are
chemically active''.  They actively manipulate bits, bytes, CPU registers,
and the movements of the instruction pointer (see below).  The digital
creatures discussed here are entirely constructed of machine instructions.
They are considered analogous to creatures of the RNA world, because the
same structures bear the genetic'' information and carry out the
metabolic'' activity.

A block of RAM memory (random access memory, also known as main'' or
core'' memory) in the computer is designated as a soup'' which can
be inoculated with creatures.  The genome'' of the creatures consists of
the sequence of machine instructions that make up the creature's
self-replicating algorithm.  The prototype creature consists of 80 machine
instructions, thus the size of the genome of this creature is 80 instructions,
and its genotype'' is the specific sequence of those 80 instructions
(Appendix C).

\LP
\bf 2.2 THE VIRTUAL COMPUTER --- TIERRA SIMULATOR\rm
\eLP

The computers we use are general purpose computers, which means, among
other things, that they are capable of emulating through software, the
behavior of any other computer that ever has been built or that could be
built (Aho et al. [2]; Minsky [26]; Langton [24]).  We can utilize
this flexibility to design a computer that would be especially hospitable
to synthetic life.

There are several good reasons why it is not wise to attempt to synthesize
digital organisms that exploit the machine codes and operating systems of
real computers.  The most urgent is the potential threat of natural evolution
of machine codes leading to virus or worm type of programs that could
be difficult to eradicate due to their changing genotypes''.  This potential
argues strongly for creating evolution exclusively in programs that run only
on virtual computers and their virtual operating systems.  Such programs
would be nothing more than data on a real computer, and therefore would
present no more threat than the data in a data base or the text file of a
word processor.

Another reason to avoid developing digital organisms in the machine code of
a real computer is that the artificial system would be tied to the hardware
and would become obsolete as quickly as the particular machine it was
developed on.  In contrast, an artificial system developed on a virtual
machine could be easily ported to new real machines as they become available.

A third issue, which potentially makes the first two moot, is that
the machine languages of real machines are not designed to be evolvable,
and in fact might not support significant evolution.  Von Neuman type
machine languages are considered to be brittle'', meaning that the
ratio of viable programs to possible programs is virtually zero.  Any
mutation or recombination event in a real machine code is almost certain
to produce a non-functional program.  The problem of brittleness can be
mitigated by designing a virtual computer whose machine code is designed
with evolution in mind.  Farmer \& Belin [17] have suggested that
overcoming this brittleness and Discovering how to make such self-replicating
patterns more robust so that they evolve to increasingly more complex states
is probably the central problem in the study of artificial life.''

The work described here takes place on a virtual computer known as Tierra
(Spanish for Earth).  Tierra is a parallel computer of the MIMD (multiple
instruction, multiple data) type, with a processor (CPU) for each creature.
Parallelism is imperfectly emulated by allowing each CPU to execute a small
time slice in turn.  Each CPU of this virtual computer contains two address
registers, two numeric registers, a flags register to indicate error
conditions, a stack pointer, a ten word stack, and an instruction pointer.
Each virtual CPU is implemented via the C structure listed in Appendix A.
Computations performed by the Tierran CPUs are probabilistic due to flaws
that occur at a low frequency (see Mutation below).

The instruction set of a CPU typically performs simple arithmetic
operations or bit manipulations, within the small set of registers contained
in the CPU.  Some instructions move data between the registers in the CPU,
or between the CPU registers and the RAM (main) memory.  Other instructions
control the location and movement of an instruction pointer'' (IP).  The
IP indicates an address in RAM, where the machine code
of the executing program (in this case a digital organism) is located.

The CPU perpetually performs a fetch-decode-execute-increment\_IP
cycle:  The machine code instruction currently addressed by the IP
is fetched into the CPU, its bit pattern is decoded to determine which
instruction it corresponds to, and the instruction is executed.  Then
the IP is incremented to point sequentially to the next position in RAM,
from which the next instruction will be fetched.  However, some instructions
like JMP, CALL and RET directly manipulate the IP, causing execution to
jump to some other sequence of instructions in the RAM.  In the Tierra
Simulator this CPU cycle is implemented through the time\_slice routine
listed in Appendix B.

\LP
\bf 2.3 THE TIERRAN LANGUAGE\rm
\eLP

Before attempting to set up a synthetic life system, careful thought must
be given to how the representation of a programming language affects its
adaptability in the sense of being robust to genetic operations such as
mutation and recombination.  The nature of the virtual computer is defined
in large part by the instruction set of its machine language.  The approach
in this study has been to loosen up the machine code in a virtual
bio-computer'', in order to create a computational system based on a hybrid
between biological and classical von Neumann processes.

In developing this new virtual language, which is called Tierran'', close
attention has been paid to the structural and functional properties of the
informational system of biological molecules: DNA, RNA and proteins.  Two
features have been borrowed from the biological world which are considered
to be critical to the evolvability of the Tierran language.

First, the instruction set of the Tierran language has been defined to be
of a size that is the same order of magnitude as the genetic
code.  Information is encoded into DNA through 64 codons, which are
translated into 20 amino acids.  In its present manifestation, the Tierran
language consists of 32 instructions, which can be represented by five bits,
\it operands included\rm.

Emphasis is placed on this last point because some instruction sets are
deceptively small.  Some versions of the redcode language of Core Wars
(Dewdney [10, 13]; Rasmussen et al. [31]) for example are defined to have
ten operation codes.  It might appear on the surface then that the
instruction set is of size ten.  However, most of the ten instructions have
one or two operands.  Each operand has four addressing modes, and then an
integer.  When we consider that these operands are embedded into the machine
code, we realize that they are in fact a part of the instruction set, and
this set works out to be about $10^{11}$ in size.  Similarly, RISC machines
may have only a few opcodes, but they probably all use 32 bit instructions,
so from a mutational point of view, they really have $2^{32}$ instructions.
Inclusion of numeric operands will make any instruction set extremely large
in comparison to the genetic code.

In order to make a machine code with a truly small instruction set, we must
eliminate numeric operands.  This can be accomplished by allowing the CPU
registers and the stack to be the only operands of the instructions.  When
we need to encode an integer for some purpose, we can create it in a numeric
register through bit manipulations: flipping the low order bit and shifting
left.  The program can contain the proper sequence of bit flipping and shifting
instructions to synthesize the desired number, and the instruction set need
not include all possible integers.

A second feature that has been borrowed from molecular biology in the design
of the Tierran language is the addressing mode, which is called address
by template''.  In most machine codes, when a piece of data is addressed, or
the IP jumps to another piece of code, the exact numeric address of the data
or target code is specified in the machine code.  Consider that in the
biological system by contrast, in order for protein molecule A in the cytoplasm
of a cell to interact with protein molecule B, it does not
specify the exact coordinates where B is located.  Instead, molecule A
presents a template on its surface which is complementary to some surface on
B.  Diffusion brings the two together, and the complementary conformations
allow them to interact.

Addressing by template is illustrated by the Tierran JMP (jump) instruction.
Each JMP instruction is followed by a sequence of NOP (no-operation)
instructions, of which there are two kinds: NOP\_0 and NOP\_1.  Suppose we
have a piece of code with five instruction in the following order:
JMP NOP\_0 NOP\_0 NOP\_0 NOP\_1.  The system will search outward in both
directions from the JMP instruction looking for the nearest occurrence of the
complementary pattern: NOP\_1 NOP\_1 NOP\_1 NOP\_0.  If the pattern is found,
the instruction pointer will move to the end of the complementary pattern
and resume execution.  If the pattern is not found, an error condition (flag)
will be set and the JMP instruction will be ignored (in practice, a limit
is placed on how far the system may search for the pattern).

The Tierran language is characterized by two unique features: a truly small
instruction set without numeric operands, and addressing by template.
Otherwise, the language consists of familiar instructions typical of most
machine languages, e.g., MOV, CALL, RET, POP, PUSH etc.  The complete
instruction set is listed in Appendix B.

\LP
\bf 2.4 THE TIERRAN OPERATING SYSTEM\rm
\eLP

The Tierran virtual computer needs a virtual operating system that will be
hospitable to digital organisms.  The operating system will determine the
mechanisms of interprocess communication, memory allocation, and the
allocation of CPU time among competing processes.  Algorithms will
evolve so as to exploit these features to their advantage.  More than being
a mere aspect of the environment, the operating system together with the
instruction set will determine the
topology of possible interactions between individuals, such as the ability
of pairs of individuals to exhibit predator-prey, parasite-host or
mutualistic relationships.

\LP
\bf 2.4.1 Memory Allocation --- Cellularity\rm
\eLP

The Tierran computer operates on a block of RAM of the real computer which
is set aside for the purpose.  This block of RAM is referred to as the
soup''.  In most of the work described here the soup consisted of about
60,000 bytes, which can hold the same number of Tierran machine instructions.
Each creature'' occupies some block of memory in this soup.

Cellularity is one of the fundamental properties of organic life, and can
be recognized in the fossil record as far back as 3.6 billion years (Barbieri
[4]).  The cell is the original individual, with the cell membrane defining
its limits and preserving its chemical integrity.  An analog to the cell
membrane is needed in digital organisms in order to preserve the integrity of
the informational structure from being disrupted easily by the activity of
other organisms.  The need for this can be seen in Artificial Life models
such as cellular automata where virtual state machines pass through one
another (Langton [22, 23]), or in core wars type simulations where
coherent structures demolish one another when they come into contact
(Dewdney [10, 13]; Rasmussen et al. [31]).

Tierran creatures are considered to be cellular in the sense that they are
protected by a semi-permeable membrane'' of memory allocation.  The Tierran
operating system provides memory allocation services.  Each creature has
exclusive write privileges within its allocated block of memory.  The size''
of a creature is just the size of its allocated block (e.g., 80 instructions).
This usually corresponds to the size of the genome.  This membrane'' is
described as semi-permeable'' because
while write privileges are protected, read and execute privileges are not.
A creature may examine the code of another creature, and even execute it,
but it can not write over it.  Each creature may have exclusive write
privileges in at most two blocks of memory: the one that it is born with
which is referred to as the mother cell'', and a second block which it
may obtain through the execution of the MAL (memory allocation) instruction.
The second block, referred to as the daughter cell'', may be used to grow
or reproduce into.

When Tierran creatures divide'', the mother cell loses write privileges
on the space of the daughter cell, but is then free to allocate another
block of memory.  At the moment of division, the daughter cell is given
its own instruction pointer, and is free to allocate its own second block of
memory.

\LP
\bf 2.4.2 Time Sharing --- The Slicer\rm
\eLP

The Tierran operating system must be multi-tasking (or parallel) in order for
a community of individual creatures to live in the soup simultaneously.  The
system doles out small slices of CPU time to each creature in the soup in turn.
The system maintains a circular queue called the slicer queue''.  As each
creature is born, a virtual CPU is created for it, and it enters the slicer
queue just ahead of its mother, which is the active creature at that time.
Thus the newborn will be the last creature in the soup to get another time
slice after the mother, and the mother will get the next slice after its
daughter.  As long as the slice size is small relative to the generation
time of the creatures, the time sharing system causes the world to approximate
parallelism.  In actuality, we have a population of virtual CPUs, each of
which gets a slice of the real CPU's time as it comes up in the queue.

The number of instructions to be executed in each time slice may be set
proportional to the size of the genome of the creature being executed, raised
to a power.  If the slicer power'' is equal to one, then the slicer is size
neutral, the probability of an instruction being executed does not depend on
the size of the creature in which it occurs.  If the power is greater than one,
large creatures get more CPU cycles per instruction than small creatures.
If the power is less than one, small creatures get more CPU cycles per
instruction.  The power determines if selection favors large or small
creatures, or is size neutral.  A constant slice size selects for small
creatures.

\LP
\bf 2.4.3 Mortality --- The Reaper\rm
\eLP

Self-replicating creatures in a fixed size soup would rapidly fill the
soup and lock up the system.  To prevent this from occurring, it is
necessary to include mortality.  The Tierran operating system includes a
reaper'' which begins killing'' creatures from a queue when the memory
fills to some specified level (e.g., 80\%).  Creatures are killed by
deallocating their memory, and removing them from both the reaper and
slicer queues.  Their dead'' code is not removed from the soup.

In the present system, the reaper uses a linear queue.  When a creature is
born it enters the bottom of the queue.  The reaper always kills the creature
at the top of the queue.  However, individuals may move up or down in the
reaper queue according to their success or failure at executing certain
instructions.  When a creature executes an instruction that generates an
error condition, it moves one position up the queue, as long as the
individual ahead of it in the queue has not accumulated a greater number
of errors.  Two of the instructions are somewhat difficult to execute
without generating an error, therefore successful execution of these
instructions moves the creature down the reaper queue one position, as long
as it has not accumulated more errors than the creature below it.

The effect of the reaper queue is to cause algorithms which are fundamentally
flawed to rise to the top of the queue and die.  Vigorous algorithms have a
greater longevity, but in general, the probability of death increases with age.

\LP
\bf 2.4.4 Mutation\rm
\eLP

In order for evolution to occur, there must be some change in the genome
of the creatures.  This may occur within the lifespan of an individual,
or there may be errors in passing along the genome to offspring.  In order
to insure that there is genetic change, the operating system randomly flips
bits in the soup, and the instructions of the Tierran language are
imperfectly executed.

Mutations occur in two circumstances.  At some background rate, bits are
randomly selected from the entire soup (e.g., 60,000 instructions totaling
300,000 bits) and flipped.  This is analogous to mutations caused by
cosmic rays, and has the effect of preventing any creature from being
immortal, as it will eventually mutate to death.  The background mutation
rate has generally been set at about one bit flipped for every 10,000
Tierran instructions executed by the system.

In addition, while copying instructions during the replication
of creatures, bits are randomly flipped at some rate in the copies.  The copy
mutation rate is the higher of the two, and results in replication errors.
The copy mutation rate has generally been set at about one bit flipped for
every 1,000 to 2,500 instructions moved.  In both classes of mutation,
the interval between mutations varies randomly within a certain range to
avoid possible periodic effects.

In addition to mutations, the execution of Tierran instructions is flawed
at a low rate.  For most of the 32 instructions, the result is off by plus
or minus one at some low frequency.  For example, the increment instruction
normally adds one to its register, but it sometimes adds two or zero.  The
bit flipping instruction normally flips the low order bit, but it sometimes
flips the next higher bit or no bit.  The shift left instruction normally
shifts all bits one bit to the left, but it sometimes shifts left by two
bits, or not at all.  In this way, the behavior of the Tierran instructions
is probabilistic, not fully deterministic.

It turns out that bit flipping mutations and flaws in instructions are not
necessary to generate genetic change and evolution, once the community
reaches a certain state of complexity.  Genetic parasites evolve which are
sloppy replicators, and have the effect of moving pieces of code around
between creatures, causing rather massive rearrangements of the genomes.
The mechanism of this ad hoc sexuality has not been worked out, but is
likely due to the parasites' inability to discriminate between live, dead
or embryonic code.

Mutations result in the appearance of new genotypes, which are watched
by an automated genebank manager.  In one implementation of the manager,
when new genotypes replicate twice, producing a genetically identical
offspring at least once, they are given a unique name and saved to disk.
Each genotype name contains two parts, a number and a three letter code.
The number represents the number of instructions in the genome.  The three
letter code is used as a base 26 numbering system for assigning a unique
label to each genotype in a size class.  The first genotype to appear in
a size class is assigned the label aaa, the second is assigned the label
aab, and so on.  Thus the ancestor is named 80aaa, and the first mutant
of size 80 is named 80aab.  The first creature of size 45 is named 45aaa.

The genebanker saves some additional information with each genome: the
genotype name of its immediate ancestor which makes possible the
reconstruction of the entire phylogeny; the time and date of origin;
metabolic'' data including the number of instructions executed in the
first and second reproduction, the number of errors generated in the first
and second reproduction, and the number of instructions copied into the
daughter cell in the first and second reproductions (see Appendix C, D); some
environmental parameters at the time of origin including the search limit
for addressing, and the slicer power, both of which affect selection for size.

\LP
\bf 2.5 THE TIERRAN ANCESTOR\rm
\eLP

I have used the Tierran language to write a single self-replicating program
which is 80 instructions long.  This program is referred to as the
ancestor'', or alternatively as genotype 0080aaa (Fig.\ 1).  The ancestor
is a minimal self-replicating algorithm which was originally written for use
during the debugging of the simulator.  No functionality was designed into
the ancestor beyond the ability to self-replicate, nor was any specific
evolutionary potential designed in.  The commented Tierran assembler and
machine code for this program is presented in Appendix C.

The ancestor examines itself to determine where in memory it begins and ends.
The ancestor's beginning is marked with the four no-operation template:
1 1 1 1, and its ending is marked with 1 1 1 0.  The ancestor locates its
beginning with the five instructions: ADRB, NOP\_0, NOP\_0, NOP\_0, NOP\_0.
This series of instructions causes the system to search backwards
from the ADRB instruction for a template complementary to the four NOP\_0
instructions, and to place the address of the complementary template
(the beginning) in the ax register of the CPU (see Appendix A).  A similar
method is used to locate the end.

Having determined the address of its beginning and its end, it subtracts
the two to calculate its size, and allocates a block of memory of this size
for a daughter cell.  It then calls the copy procedure which copies the entire
genome into the daughter cell memory, one instruction at a time.
The beginning of the copy procedure is marked by the four no-operation
template: 1 1 0 0.  Therefore the call to the copy procedure is accomplished
with the five instructions: CALL, NOP\_0, NOP\_0, NOP\_1, NOP\_1.

When the genome has been copied, it executes the DIVIDE instruction, which
causes the creature to lose write privileges on the daughter cell memory,
and gives an instruction pointer to the daughter cell (it also enters the
daughter cell into the slicer and reaper queues).  After this first
replication, the mother cell does not examine itself again; it proceeds
directly to the allocation of another daughter cell, then the copy procedure
is followed by cell division, in an endless loop.

Fourty-eight of the eighty instructions in the ancestor are no-operations.
Groups of four no-operation instructions are used as complementary templates
to mark twelve sites for internal addressing, so that the creature can locate
its beginning and end, call the copy procedure, and mark addresses for loops
and jumps in the code, etc.  The functions of these templates are commented
in the listing in Appendix C.

\pagebreak

\LP
\rule[6pt]{6.5in}{1pt}

\large \bf 3 RESULTS\rm \normalsize

\bf 3.1 EVOLUTION\rm
\eLP

Evolutionary runs of the simulator are begun by inoculating the soup of about
60,000 instructions with a single individual of the 80 instruction ancestral
genotype.  The passage of time in a run is measured in terms of how many
Tierran instructions have been executed by the simulator.  The original
ancestral cell executes 839 instructions in its first replication, and 813 for
each additional replication.  The initial cell and its replicating daughters
rapidly fill the soup memory to the threshold level of 80\% which starts the
reaper.  Typically, the system executes about 400,000 instructions in filling
up the soup with about 375 individuals of size 80 (and their gestating
daughter cells).  Once the reaper begins, the memory remains roughly 80\%
filled with creatures for the remainder of the run.

\LP
\bf 3.1.1 Micro-Evolution\rm
\eLP

If there were no mutations at the outset of the run, there would be no
evolution.  However, the bits flipped as a result of copy errors or background
mutations result in creatures whose list of 80 instructions (genotype) differs
from the ancestor, usually by a single bit difference in a single instruction.

Mutations in and of themselves, can not result in a change in the size of
a creature, they can only alter the instructions in its genome.  However,
by altering the genotype, mutations may affect the process whereby the
creature examines itself and calculates its size, potentially causing it
to produce an offspring that differs in size from itself.

Four out of the five possible mutations in a no-operation instruction convert
it into another kind of instruction, while one out of five converts it into
the complementary no-operation.  Therefore 80\% of mutations in templates
destroy or change the size of the template, while one in five alters the
template pattern.  An altered template may cause the creature to make
mistakes in self examination, procedure calls, or looping or jumps of the
instruction pointer, all of which use templates for addressing.

\LP
\bf 3.1.1.1 parasites\rm
\eLP

An example of the kind of error that can result from a mutation in a
template is a mutation of the low order bit of instruction 42 of the
ancestor (Appendix C).  Instruction 42 is a NOP\_0, the third component
of the copy procedure template.  A mutation in the low order bit would
convert it into NOP\_1, thus changing the template from 1 1 0 0 to: 1 1 1 0.
This would then be recognized as the template used to mark the end of the
creature, rather than the copy procedure.

A creature born with a mutation in the low order bit of instruction 42 would
calculate its size as 45.  It would allocate a daughter cell of size 45 and
copy only instructions 0 through 44 into the daughter cell.  The daughter
cell then, would not include the copy procedure.  This daughter genotype,
consisting of 45 instructions, is named 0045aaa.

Genotype 0045aaa (Fig.\ 1) is not able to self-replicate in isolated culture.
However, the semi-permeable membrane of memory allocation only protects write
privileges.  Creatures may match templates with code in the allocated memory
of other creatures, and may even execute that code.  Therefore, if creature
0045aaa is grown in mixed culture with 0080aaa, when it attempts to call the
copy procedure, it will not find the template within its own genome, but if
it is within the search limit (generally set at 200--400 instructions) of the
copy procedure of a creature of genotype 0080aaa, it will match templates, and
send its instruction pointer to the copy code of 0080aaa.  Thus a parasitic
relationship is established (see ECOLOGY below).  Typically, parasites begin
to emerge within the first few million instructions of elapsed time in a run.

\LP
\bf 3.1.1.2 immunity to parasites\rm
\eLP

At least some of the size 79 genotypes demonstrate some measure of
resistance to parasites.  If genotype 45aaa is introduced into a soup,
flanked on each side with one individual of genotype 0079aab, 0045aaa will
initially reproduce somewhat, but will be quickly eliminated from the soup.
When the same experiment is conducted with 0045aaa and the ancestor, they
enter a stable cycle in which both genotypes coexist indefinitely.  Freely
evolving systems have been observed to become dominated by size 79 genotypes
for long periods, during which parasitic genotypes repeatedly appear, but
fail to invade.

\LP
\bf 3.1.1.3 circumvention of immunity to parasites\rm
\eLP

Occasionally these evolving systems dominated by size 79 were successfully
invaded by parasites of size 51.  When the immune genotype 0079aab was tested
with 0051aao (a direct, one step, descendant of 0045aaa in which instruction
39 is replaced by an insertion of seven instructions of unknown origin), they
were found to enter a stable cycle.  Evidently 0051aao has evolved some way to
circumvent the immunity to parasites possessed by 0079aab.  The fourteen
genotypes 0051aaa through 0051aan were also tested with 0079aab, and none were
able to invade.

\LP
\bf 3.1.1.4 hyper-parasites\rm
\eLP

Hyper-parasite have been discovered, (e.g., 0080gai, which differs by 19
instructions from the ancestor, Fig.\ 1).  Their ability to subvert the energy
metabolism of parasites is based on two changes.  The copy procedure does not
return, but jumps back directly to the proper address of the reproduction loop.
In this way it effectively seizes the instruction pointer from the parasite.
However it is another change which delivers the coup de gr\^{a}ce: after
each reproduction, the hyper-parasite re-examines itself, resetting the bx
register with its location and the cx register with its size.  After the
instruction pointer of the parasite passes through this code, the CPU of the
parasite contains the location and size of the hyper-parasite and the
parasite thereafter replicates the hyper-parasite genome.

\LP
\bf 3.1.1.5 social hyper-parasites\rm
\eLP

Hyper-parasites drive the parasites to extinction.  This results in a
community with a relatively high level of genetic uniformity, and therefore
high genetic relationship between individuals in the community.  These are
the conditions that support the evolution of sociality, and social
hyper-parasites soon dominate the community.  Social hyper-parasites (Fig.\ 2)
appear in the 61 instruction size class.  For example, 0061acg is social in
the sense that it can only self-replicate when it occurs in aggregations.
When it jumps back to the code for self-examination, it jumps to a template
that occurs at the end rather than the beginning of its genome.  If the
creature is flanked by a similar genome, the jump will find the target
template in the tail of the neighbor, and execution will then pass into
the beginning of the active creature's genome.  The algorithm will fail unless
a similar genome occurs just before the active creature in memory.  Neighboring
creatures cooperate by catching and passing on jumps of the instruction
pointer.

It appears that the selection pressure for the evolution of sociality is that
it facilitates size reduction.  The social species are 24\% smaller than the
ancestor.  They have achieved this size reduction in part by shrinking their
templates from four instructions to three instructions.  This means that there
are only eight templates available to them, and catching each others jumps
allows them to deal with some of the consequences of this limitation as well
as to make dual use of some templates.

\LP
\bf 3.1.1.6 cheaters: hyper-hyper-parasites\rm
\eLP

The cooperative social system of hyper-parasites is subject to cheating,
and is eventually invaded by hyper-hyper-parasites (Fig.\ 2).  These cheaters
(e.g., 0027aab) position themselves between aggregating hyper-parasites so
that when the instruction pointer is passed between them, they capture it.

\LP
\bf 3.1.1.7 a novel self-examination\rm
\eLP

All creatures discussed thus far mark their beginning and end with templates.
They then locate the addresses of the two templates and determine their genome
size by subtracting them.  In one run, creatures evolved without a template
marking their end.  These creatures located the address of the template
marking their beginning, and then the address of a template in the middle of
their genome.  These two addresses were then subtracted to calculate half of
their size, and this value was multiplied by two (by shifting left) to
calculate their full size.

\LP
\bf 3.1.1.8 an intricate adaptation\rm
\eLP

The arms race described in the paragraphs above took place over a period of
a billion instructions executed by the system.  Another run was allowed to
continue for fifteen billion instructions, but was not examined in detail.
A creature present at the end of the run was examined and found to have
evolved an intricate adaptation.  The adaptation is an optimization technique
known as unrolling the loop''.

The central loop of the copy procedure performs the following operations:
1) copies an instruction from the mother to the daughter, 2) decrements the
cx register which initially contains the size of the parent genome, 3) tests
to see if cx is equal to zero, if so it exits the loop, if not it remains
in the loop, 4) increment the ax register which contains the address in the
daughter where the next instruction will be copied to, 5) increment the
bx register which contains the address in the mother where the next instruction
will be copied from, 6) jump back to the top of the loop.

The work of the loop is contained in steps 1, 2, 4 and 5.  Steps 3 and 6 are
overhead.  The efficiency of the loop can be increased by duplicating the
work steps within the loop, thereby saving on overhead.  The creature from
the end of the long run had repeated the work steps three times within the
loop, as illustrated in Appendix E, which compares the copy loop of the
ancestor with that of its decendant.

\LP
\bf 3.1.2 Macro-Evolution\rm
\eLP

When the simulator is run over long periods of time, hundreds of millions or
billions of instructions, various patterns emerge.  Under selection for small
sizes there is a proliferation of small parasites and a rather interesting
ecology (see below).  Selection for large creatures has usually lead to
continuous incrementally increasing sizes (but not to a trivial concatenation
of creatures end-to-end) until a plateau in the upper hundreds is reached.
In one run, selection for large size lead to apparently open ended size
increase, evolving genomes larger than 23,000 instructions in length.
These evolutionary patterns might be described as phyletic gradualism.

The most thoroughly studied case for long runs is where selection, as
determined by the slicer function, is size neutral.  The longest runs to date
(as much as 2.86 billion Tierran instructions) have been in a size neutral
environment, with a search limit of 10,000, which would allow large creatures
to evolve if there were some algorithmic advantage to be gained from larger
size.  These long runs illustrate a pattern which could be described as
periods of stasis punctuated by periods of rapid evolutionary change, which
appears to parallel the pattern of punctuated equilibrium described by
Eldredge \& Gould [15] and Gould \& Eldredge [19].

Initially these communities are dominated by creatures with genome sizes
in the eighties.  This represents a period of relative stasis, which has
lasted from 178 million to 1.44 billion instructions in the several long
runs conducted to date.  The systems then very abruptly (in a span of 1 or
2 million instructions) evolve into communities dominated by sizes ranging
from about 400 to about 800.  These communities have not yet been seen to
evolve into communities dominated by either smaller or substantially larger
size ranges.

The communities of creatures in the 400 to 800 size range also show a
long-term pattern of punctuated equilibrium.  These communities regularly come
to be dominated by one or two size classes, and remain in that condition for
long periods of time.  However, they inevitably break out of that stasis
and enter a period where no size class dominates.  These periods of rapid
evolutionary change may be very chaotic.  Close observations indicate that
at least at some of these times, no genotypes breed true.  Many
self-replicating genotypes will coexist in the soup at these times, but at
the most chaotic times, none will produce offspring which are even their same
size.  Eventually the system will settle down to another period of stasis
dominated by one or a few size classes which breed true.

\LP
\bf 3.2 DIVERSITY\rm
\eLP

Most observations on the diversity of Tierran creatures have been based on
the diversity of size classes.  Creatures of different sizes are clearly
genetically different, as their genomes are of different sizes.  Different
sized creatures would have some difficulty engaging in recombination if they
were sexual, thus it is likely that they would be different species.
In a run of 526 million instructions, 366 size classes were generated, 93
of which achieved abundances of five or more individuals.  In a run of
2.56 billion instructions, 1180 size classes were generated, 367 of which
achieved abundances of five or more.

Each size class consists of a number of distinct genotypes which also vary
over time.  There exists the potential for great genetic diversity within a
size class.  There are 32$^{80}$ distinct genotypes of size 80, but how many
of those are viable self-replicating creatures?  This question remains
unanswered, however some information has been gathered through the use
of the automated genebank manager.

In several days of running the genebanker, over 29,000 self-replicating
genotypes of over 300 size classes accumulated.  The size classes and
the number of unique genotypes banked for each size are listed in Table 1.
The genotypes saved to disk can be used to inoculate new soups individually,
or collections of these banked genotypes may be used to assemble ecological
communities''.  In ecological'' runs, the mutation rates can be set to zero
in order to inhibit evolution.





The entire AOH site is optimized to look best in Firefox® 3 on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986- AOH
We do not send spam. If you have received spam bearing an artofhacking.com email address, please forward it with full headers to abuse@artofhacking.com.