cpu_tour.html


<div style="width:700px; height:400px">
<!-- from http://devolab.cse.msu.edu/software/avida/doc/cpu_tour.html -->
<div class="revision" style="text-align:right">
Revised 2006-09-05 DMB
</div>

<div align="center">
<h1>A Guided Tour of an Ancestor and its Hardware</h1>
</div>

<p>
This document describes the structure of the classic virtual CPU and an
example organism running on it.
</p>

<h2>The Virtual CPU Structure</h2>

<p>
The virtual CPU, which is the default "body" or "hardware" of the organisms,
contains the following set of components, (as further illustrated in the
figure below).
</p>

<ul>
<li>A <strong style="color: #006666">memory</strong> that consists of a sequence
    of instructions, each associated with a set of flags to denote if the
    instruction has been executed, copied, mutated, etc.</li>
<li>An <strong style="color: #880088">instruction pointer</strong> (IP) that
    indicates the next site in the memory to be executed.</li>
<li>Three <strong style="color: #FF8888">registers</strong> that can be used
    by the organism to hold data currently being manipulated.  These are often
    operated upon by the various instructions, and can contain arbitrary
    32-bit integers.</li>
<li>Two <strong style="color: #008800">stacks</strong> that are used for storage.
    The organism can theoretical store an arbitrary amount of data in the
    stacks, but for practical purposes we currently limit the maximum stack
    depth to ten.</li>
<li>An <strong style="color: #CCCC00">input buffer</strong> and an
    <strong style="color: #CCCC00">output buffer</strong> that the organism
    uses to receive information, and return the processed results.</li>
<li>A <strong style="color: #8888FF">Read-Head</strong>, a
    <strong style="color: #8888FF">Write-Head</strong>, and a
    <strong style="color: #8888FF">Flow-Head</strong> which are used to
    specify positions in the CPU's memory.  A copy command reads from the
    Read-Head and writes to the Write-Head.  Jump-type statements move the IP
    to the Flow-Head.</li>
</ul>

<div align="center">
<img src="images/cpu2.gif" />
</div>


<p>&nbsp;</p>
<h2>Instruction Set Configuration</h2>

<p>
The instruction set in Avida is loaded on startup from a configuration file
specified in the <kbd>avida.cfg</kbd> file.  This allows selection of different
instruction sets without recompiling the source code, as well as allowing different
sized instruction sets to be specified.  It is not possible to alter the behavior
of individual instructions or add new instructions without recompiling Avida;
such activities have to be done directly in the source code.

<p>
The available instructions are listed in the <kbd>inst_set.*</kbd> files with
a <code>1</code> or a <code>0</code> next to an instruction to indicate if it
should or should not be included.  Changing the instruction set to be used
simply involves adjusting these flags.

<p>
The instructions were created with three things in mind:

<ul>
<li>
To be as complete as possible (both in a &quot;Turing complete&quot; sense -- that
is, it can compute any computable function -- and, more practically, to ensure that
simple operations only require a few instructions).
</li>
<li>
For each instruction to be as robust and versatile as possible; all instructions
should take an &quot;appropriate&quot; action in any situation where they can be
executed.
</li>
<li>
To have as little redundancy as possible between instructions. (Several instructions
have been implemented that are redundant, but such combinations will typically not
be turned on simultaneously for a run.)
</li>
</ul>

<p>
One major concept that differentiates this virtual assembly language from its
real-world counterparts is in the additional uses of <code>nop</code> instructions
(no-operation commands).
These have no direct effect on the virtual CPU when executed, but often
modify the effect of any instruction that precedes them.  In a sense, you
can think of them as purely regulatory genes.  The default instruction set
has three such <code>nop</code> instructions: <code>nop-A</code>, <code>nop-B</code>,
and <code>nop-C</code>.
</p>
<p>
The remaining instructions can be seperated into three classes.  The first
class is those few instructions that are unaffected by nops.  Most of these
are the &quot;biological&quot; instructions involved directly in the replication
process. The second class of instructions is those for which a <code>nop</code>
changes the head or register affected by the previous command.  For example,
an <code>inc</code> command followed by the instruction <code>nop-A</code> would
cause the contents of the <code>AX</code> register to be incremented, while an
<tt>inc</tt> command followed by a <tt>nop-B</tt> would increment <code>BX</code>.
</p>
<p>
The notation we use in instruction definitions to describe that a default
component (that is, a register or head) can be replaced due to a nop command
is by surrounding the component name with <code>?</code>'s.  The component listed is the
default one to be used, but if a <code>nop</code> follows the command, the
component it represents in this context will replace this default.
If the component between the question marks is a register than a
subsequent <code>nop-A</code> represents the <code>AX</code> register,
<code>nop-B</code> is <code>BX</code>, and <code>nop-C</code> is <code>CX</code>.
If the component listed is a head (including the instruction pointer) then a
<code>nop-A</code> represents the Instruction Pointer, <code>nop-B</code> represents
the Read-Head, and <code>nop-C</code> is the Write-Head.  Currently
the Flow-Head has no <code>nop</code> associated with it.
</p>
<p>
The third class of instructions are those that use a series of <tt>nop</tt>
instructions as a template (label) for a command that needs to reference
another position in the code, such as
<code>h-search</code>.  If <code>nop-A</code> follows a search command, it scans for
the first complementary template (<code>nop-B</code>) and moves the Flow-Head
there.  Templates may be composed of more than a single <code>nop</code>
instruction.  A series of nops is typically abbreviated to the associated
letter and separated by colons.  This the sequence &quot;<code>nop-A nop-A nop-C</code>&quot;
would be displayed as &quot;<code>A:A:C</code>&quot;.
</p>
<p>
The label system used in Avida allows for an arbitrary number of nops.  By
default, we have three: <code>nop-A</code>'s complement is <code>nop-B</code>,
<code>nop-B</code>'s is <code>nop-C</code>, and <code>nop-C</code>'s is <code>nop-A</code>.
Likewise, some instructions talk about the complement of a register or head
-- the same pattern is used in those cases.  So if an instruction tests if
<code>?BX?</code> is equal to its complement, it will test if
<code>BX&nbsp;==&nbsp;CX</code> by default, but if it is followed by a
<code>nop-C</code> it will test if <code>CX&nbsp;==&nbsp;AX</code>.
</p>


<p>&nbsp;</p>
<h2>Instruction Set Reference</h2>

<p>
An abbreviated description of the 26 
default instructions is below.

<table>
<tr><th valign=top>(a-c) <td><code>nop-A</code>, <code>nop-B</code>,&nbsp;<br> and <code>nop-C</code>
    <td valign=top>No-operation instructions; these modify other instructions.</tr>
<tr><th>(d) <td><code>if-n-equ</code>
    <td>Execute next instruction only-if ?BX? does not equal its complement</tr>
<tr><th>(e) <td><code>if-less</code>
    <td>Execute next instruction only if ?BX? is less than its complement</tr>
<tr><th>(f) <td><code>if-label</code>
    <td>Execute the next instruction only if the given template complement was just copied</tr>
<tr><th>(g) <td><code>mov-head</code>
    <td>Move the ?IP? to the same position as the Flow-Head</tr>
<tr><th>(h) <td><code>jmp-head</code>
    <td>Move the ?IP? by a fixed amount found in CX</tr>
<tr><th>(i) <td><code>get-head</code>
    <td>Write the position of the ?IP? into CX</tr>
<tr><th>(j) <td><code>set-flow</code>
    <td>Move the Flow-Head to the memory position specified by ?CX?</tr>
<tr><th>(k) <td><code>shift-r</code>
    <td>Shift all the bits in ?BX? one to the right</tr>
<tr><th>(l) <td><code>shift-l</code>
    <td>Shift all the bits in ?BX? one to the left</tr>
<tr><th>(m) <td><code>inc</code>
    <td>Increment ?BX?</tr>
<tr><th>(n) <td><code>dec</code>
    <td>Decrement ?BX?</tr>
<tr><th>(o) <td><code>push</code>
    <td>Copy the value of ?BX? onto the top of the current stack</tr>
<tr><th>(p) <td><code>pop</code>
    <td>Remove a number from the current stack and place it in ?BX?</tr>
<tr><th>(q) <td><code>swap-stk</code>
    <td>Toggle the active stack</tr>
<tr><th>(r) <td><code>swap</code>
    <td>Swap the contents of ?BX? with its complement.</tr>
<tr><th>(s) <td><code>add</code>
    <td>Calculate the sum of BX and CX; put the result in ?BX?</tr>
<tr><th>(t) <td><code>sub</code>
    <td>Calculate the BX minus CX; put the result in ?BX?</tr>
<tr><th>(u) <td><code>nand</code>
    <td>Perform a bitwise NAND on BX and CX; put the result in ?BX?</tr>
<tr><th>(v) <td><code>h-copy</code>
    <td>Copy an instruction from the Read-Head to the Write-Head and advance both.</tr>
<tr><th>(w) <td><code>h-alloc</code>
    <td>Allocate memory for an offspring</tr>
<tr><th>(x) <td><code>h-divide</code>
    <td>Divide off an offspring located between the Read-Head and Write-Head.</tr>
<tr><th>(y) <td><code>IO</code>
    <td>Output the value ?BX? and replace it with a new input</tr>
<tr><th>(z) <td><code>h-search</code>
    <td>Find a complement template and place the Flow-Head after it.</tr>
</table>


<h3>An Example Ancestor</h3>

The following organism is stored in the file <kbd>organism.heads.15</kbd>,
which you should find in the <kbd>support/config/misc/</kbd> directory.  This is a
simplified version of <kbd>organism.default</kbd> and
<kbd>organism.heads.100</kbd>, of lengths 50 and 100 respectively (each has
additional instructions placed before the copy loop)

<p>
<table>
<tr><td colspan=2><font color="#886600"># ---  Setup  ---</font><br>
<tr><td><b><code>h-alloc</code></b>    <td><font color="#886600"># Allocate extra space at the end of the genome to copy the offspring into.</font><br>
<tr><td><b><tt>h-search&nbsp;</tt></b>   <td><font color="#886600"># Locate an
    <tt>A:B</tt> template (at the end of the organism) and
    place the Flow-Head after it</font><br>
<tr><td><b><tt>nop-C</tt></b>      <td><font color="#886600">#</font><br>
<tr><td><b><tt>nop-A</tt></b>      <td><font color="#886600">#</font><br>
<tr><td><b><tt>mov-head</tt></b>   <td><font color="#886600"># Place the
    Write-Head at the Flow-Head (which is at beginning of
    offspring-to-be).</font><br>
<tr><td><b><tt>nop-C</tt></b>      <td><font color="#886600"># [ Extra
    <tt>nop-C</tt> commands can be placed here w/o harming the
    organism! ]</font><br>
<tr><td>&nbsp;
<tr><td colspan=2><font color="#886600"># ---  Copy Loop  ---</font><br>
<tr><td><b><tt>h-search</tt></b>   <td><font color="#886600"># No template,
    so place the Flow-Head on the next line code</font><br>
<tr><td><b><code>h-copy</code></b>     <td><font color="#886600"># Copy a single
    instruction from the read head to the write head (and advance both
    heads!)</font><br>
<tr><td><b><code>if-label</code></b>   <td><font color="#886600"># Execute the
    line following this template <i>only if</i> we have just copied an
    <code>A:B</code> template.</font><br>
<tr><td><b><code>nop-C</code></b>      <td><font color="#886600">#</font><br>
<tr><td><b><code>nop-A</code></b>      <td><font color="#886600">#</font><br>
<tr><td><b><code>h-divide</code></b>   <td><font color="#886600">#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...Divide off offspring! (note if-statement above!)</font><br>
<tr><td><b><code>mov-head</code></b>   <td><font color="#886600"># Otherwise,
    move the IP back to the Flow-Head at the beginning of the copy
    loop.</font><br>
<tr><td><b><code>nop-A</code></b>      <td><font color="#886600"># End label.</font><br>
<tr><td><b><code>nop-B</code></b>      <td><font color="#886600"># End label.</font><br>
</table>

<p>
This program begins by allocating extra space for its offspring.  The
exact amount of space does not need to be specified -- it will allocate as
much as it is allowed to.  The organism will then do a search for the end
of its genome (where this new space was just placed) so that it will know
where to start copying.  First the Flow-Head is placed there, and then the
Write-Head is moved to the same point.
</p>
<p>
It is after this initial setup and before the actual copying process
commences that extra <code>nop</code> instructions can be included.  The only
caveat is that you need to make sure that you don't duplicate any templates
that the program will be searching for, or else it will no longer function
properly.  The easiest thing to do is insert a long sequence of <code>nop-C</code>
instructions.
</p>
<p>
Next we have the beginning of the "copy loop".  This segement of code starts
off with an <code>h-search</code> command with no template following it.  In
such as case, the Flow-Head is placed on the line immediately following the
search.  This head will be used to designate the place that the IP keeps
returning to with each cycle of the loop.
</p>
<p>
The <code>h-copy</code> command will copy a single instruction from the Read-Head
(still at the very start of the genome, where it begins) to the Write-Head
(which we placed at the beginning of the offspring).  With any copy command
there is a user-specified chance of a copy mutation.  If one occurs, the
Write-Head will place a random instruction rather than the one that it
gathered from the Read-Head.  After the copy occurs (for better or worse),
both the Read-Head and the Write-Head are advanced to the next instruction
in the genome.  It is for this reason that a common mutation we see happening
will place a long string of h-copy instruction one after another.
</p>
<p>
The next command, <code>if-label</code> (followed by a <code>nop-C</code> and a
<code>nop-A</code>) tests to see if the complement of <code>C:A</code> is the most
thing copied.  That is, if the two most
recent instructions copied were a <code>nop-A</code> followed by a <code>nop-B</code>
as is found at the end of the organism.  If so, we are done!  Execute the
next instruction which is <code>h-divide</code> (when this occurs, the read and
write heads will surround the portion of memory to be split off as the
offspring's genome).  If not, then we need to keep
going.  Skip the next instruction and move on to the <code>mov-head</code> which
will move the head specified by the <code>nop</code> that follows (in this case
<code>nop-A</code> which is the Instruction Pointer) to the Flow-Head at the
beginning of the copy loop.
</p>
<p>
This process will continue until all of the lines of code have been copied,
and an offspring is born.
</p>


<h3>An Example Logic Gene</h3>

<p>
Here is a short example program to demonstrate one way for an organism to
perform the "OR" logic operation.  This time I'm only going to show the
contents of the registers after each command because the functionality of
the individual instructions should be clear, and the logic itself won't
be helped much by a line-by-line explanation in English.

<p>
<center>
<table width=75%>
<tr><th bgcolor="#CCCCFF" width=11%>Line #
    <th bgcolor="#CCCCFF" width=34%>Instruction
    <td bgcolor="#CCCCFF" width=11%><b>AX</b>
    <td bgcolor="#CCCCFF" width=11%><b>BX</b>
    <td bgcolor="#CCCCFF" width=11%><b>CX</b>
    <td bgcolor="#CCCCFF" width=11%><b>Stack</b>
    <td bgcolor="#CCCCFF" width=11%><b>Output</b>
<tr><th bgcolor="#FF88FF"> 1 <th bgcolor="#88FFFF">IO
    <td bgcolor="#FFBBBB">?  <td bgcolor="#FFBBBB">X  <td bgcolor="#FFBBBB">?  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88"> ?
<tr><th bgcolor="#FF88FF"> 2 <th bgcolor="#88FFFF">push  
    <td bgcolor="#FFBBBB">?  <td bgcolor="#FFBBBB">X  <td bgcolor="#FFBBBB">?  
    <td bgcolor="#AAFFAA">X, ?       <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF"> 3 <th bgcolor="#88FFFF">pop   
    <td bgcolor="#FFBBBB">?  <td bgcolor="#FFBBBB">X  <td bgcolor="#FFBBBB">X  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF"> 4<th bgcolor="#88FFFF">nop-C <td colspan=5>&nbsp;
<tr><th bgcolor="#FF88FF"> 5 <th bgcolor="#88FFFF">nand  
    <td bgcolor="#FFBBBB">~X <td bgcolor="#FFBBBB">X  <td bgcolor="#FFBBBB">X  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF"> 6<th bgcolor="#88FFFF">nop-A <td colspan=5>&nbsp;
<tr><th bgcolor="#FF88FF"> 7 <th bgcolor="#88FFFF">IO    
    <td bgcolor="#FFBBBB">~X <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">X  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88"> X
<tr><th bgcolor="#FF88FF"> 8 <th bgcolor="#88FFFF">push  
    <td bgcolor="#FFBBBB">~X <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">X  
    <td bgcolor="#AAFFAA">Y, ?       <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF"> 9 <th bgcolor="#88FFFF">pop   
    <td bgcolor="#FFBBBB">~X <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">Y  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF">10<th bgcolor="#88FFFF">nop-C <td colspan=5>&nbsp;
<tr><th bgcolor="#FF88FF">11 <th bgcolor="#88FFFF">nand  
    <td bgcolor="#FFBBBB">~X <td bgcolor="#FFBBBB">~Y <td bgcolor="#FFBBBB">Y  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF">12 <th bgcolor="#88FFFF">swap  
    <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">~Y <td bgcolor="#FFBBBB">~X 
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF">13<th bgcolor="#88FFFF">nop-C <td colspan=5>&nbsp;
<tr><th bgcolor="#FF88FF">14 <th bgcolor="#88FFFF">nand  
    <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">X or Y <td bgcolor="#FFBBBB">~X  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88">&nbsp;
<tr><th bgcolor="#FF88FF">15 <th bgcolor="#88FFFF">IO    
    <td bgcolor="#FFBBBB">Y  <td bgcolor="#FFBBBB">Z      <td bgcolor="#FFBBBB">~X  
    <td bgcolor="#AAFFAA">? <td bgcolor="#FFFF88"> X or Y
</table>
</center>

<hr />
</div>