Genetic programming (GP) is a technique where a set of genes are modified (evolved) using an evolutionary algorithm. The io.jenetics.prog
modules contains a ProgramGene
, together with its corresponding ProgramChromosome
, which allows to create operation (Op
) trees.
The io.jenetics.prog
module contains the classes which are needed for doing Genetic programming with the Jenetics library. It introduces a new Chromosome
/Gene
pair, which are the main entry points.
public interface Op<T> {
public String name();
public int arity();
public T apply(T[] args);
}
The generic type of the Op
enforces data-type constraints for the created program tree and makes the implementation a strongly typed GP algorithm.
When creating a new program tree, it is not necessary to implement own instance of the ProgramGene
or ProgramChromosome
class. The extension point for own programs is the Op
interface.
final Op<Double> myop = Op.of("myop", 3, v -> v[0]*v[1] + v[2]);
In the example above, a new operation with the "myop" and arity 3 is defined. Whenever the operation is evaluated, the function f(x, y, z) = x*y + z is executed.
Note
|
The class MathOp in the io.jenetics.prog.op package defines a set of mathematical standard operations/functions.
|
When creating a new ProgramChromosome
we must distinguish two different kind of operations:
-
Non-terminal operations have an arity greater than zero and form their own sub-tree.
-
Terminal operations have an arity of zero and form the leaves of a program tree.
There are currently three different types of terminal operations implemented, Var
, Const
and the EphemeralConst
class.
Var
The Var
operation defines a variable of a program, which are set from the program arguments.
final ISeq<Op<Double>> terminals = ISeq.of(
Var.of("x", 0), Var.of("y", 1), Var.of("z", 2)
);
The terminal operation list in the example code above will lead to a program which takes three input parameters, x, y and z, with the argument indices 0, 1 and 2.
Const
The Const
operation will always return the same, constant, value when evaluated.
final Op<Double> one = Const.of(1.0);
final Op<Double> pi = Const.of("π", Math.PI);
We can create a constant operation in to flavours, with a value only and with a dedicated name. If a constant has a name, the symbolic name is used, instead of the value, when the program tree is printing.
EphemeralConst
An ephemeral constant is a special constant, which is only constant within an tree. If a new tree is created, a new constant is created, by the Supplier
function the ephemeral constant is created with.
final Op<Double> rand1 = EphemeralConst.of(Math::random);
final Op<Double> rand2 = EphemeralConst.of("R", Math::random);
The ProgramChromosome
comes with some factory methods, which lets you easily create program trees with a given depth and a given set of operations and terminals.
final int depth = 5;
final Op<Double> operations = ISeq.of(...);
final Op<Double> terminals = ISeq.of(...);
final ProgramChromosome<Double> program =
ProgramChromosome.of(5, operations, terminals);
The code snippet above will create a perfect program tree of depth 5. All non-leaf nodes will contain operations, randomly selected from the operations set. Whereas all leaf nodes are filled with operations from the terminals set.
Note
|
The created program tree is perfect, which means that all leaf nodes have the same depth. If new trees needs to be created during evolution, they will be created with the depth, operations and terminals defined by the template program tree. |
The GP Engine
is created in the exact same way as the GA Engine
.
final Engine<ProgramGene<Double>, Double> engine = Engine
.builder(Main::error, program)
.minimizing()
.alterers(
new SingleNodeCrossover<>(),
new Mutator<>())
.build();
For a detailed description on the correct Engine
setup have a look at the Example section.
The specialized Alterer
classes for the ProgramChromosome
guarantees that the program tree after the alter operation is still valid. They obey the tree structure of the chromosome, although they are altering the flattened ProgramGene
sequence. General alterers, not written for ProgramChromosome
, will most likely destroy the tree property of the altered chromosome. There are essentially two possibility for handling invalid tree chromosomes: 1) marking the chromosome as invalid or 2) try to repair the invalid chromosome. Possibility 1) would be easier, but it would also lead to a possible large number of invalid chromosomes.
Note
|
Jenetics allows the usage of arbitrary Alterer implementations. Even alterers not implemented for ProgramChromosomes . Chromosomes destroyed by the alterer are repaired.
|
Symbolic regression involves finding a mathematical expression, in symbolic form, that provides a good, best, or perfect fit between a given finite sampling of values of the independent variables and the associated values of the dependent variables.–John R. Koza, Genetic Programming I
The following example shows how to solve a GP problem with Jenetics. We are trying to find a polynomial (or an arbitrary mathematical function) which approximates a given data set.
x |
y |
0.00 |
0.0000 |
0.10 |
0.0740 |
0.20 |
0.1120. |
0.30 |
0.1380 |
… |
… |
The sample points has been created with the function f(x) = 4*x^3 - 3*x^2 + x. The knowledge of the creating function makes it easier to compare the quality of the evolved function. For the example we created 21 data points.
Note
|
The function which created the sample points is not needed in the error function we have to define for the GP. It just let us verify the final, evolved result. |
As first step, we have to define the set of allowed non-terminal and the terminal operations the GP is working with. Selecting the right set of operation has a big influence on the performance of the GP. If the operation set is bigger than necessary, we will expand the potential search space, and the execution time for finding a solution. For our polynomial example we will chose the following operations and terminals.
static final ISeq<Op<Double>> OPERATIONS = ISeq.of(
MathOp.ADD,
MathOp.SUB,
MathOp.MUL
);
static final ISeq<Op<Double>> TERMINALS = ISeq.of(
Var.of("x", 0),
EphemeralConst.of(() ->
(double)RandomRegistry.getRandom().nextInt(10))
);
The chosen non-terminal operation set is sufficient to create any polynomial. For the terminal operations, we added a variable "x", with index zero, and an ephemeral int constant. The purpose of the ephemeral constant is to create constant values, which will differ for every tree, but stay constant within a tree.
In the next step define the fitness function for the GP, which will be an error function we will minimize.
// The lookup table where the data points are stored.
static final double[][] SAMPLES = new double[][] {
{-1.0, -8.0000},
{-0.9, -6.2460},
...
};
static double error(final ProgramGene<Double> program) {
return Arrays.stream(SAMPLES).mapToDouble(sample -> {
final double x = sample[0];
final double y = sample[1];
final double result = program.eval(x);
return abs(y - result) + program.size()*0.0001;
})
.sum();
}
The error function calculates the sum of the (absolute) difference between the sample value and the value calculated the by the evolved program (ProgramGene
). Since we prefer compact programs over complex one, we will add a penalty for the program size (the number of nodes of the program tree).
Caution
|
The penalty for the tree size must be small enough to not dominate the error function. We still want to find an approximating function and not the smallest possible one. |
After we have defined the error function, we need to define the proper Codec
.
static final Codec<ProgramGene<Double>, ProgramGene<Double>> CODEC =
Codec.of(
Genotype.of(ProgramChromosome.of(
// Program tree depth.
5,
// Chromosome validator.
ch -> ch.getRoot().size() <= 50,
OPERATIONS,
TERMINALS
)),
Genotype::getGene
);
There are two particularities in the definition of the ProgramChromosome
:
-
Since we want to narrow the search space, we are limiting the depth of newly created program trees to 5.
-
Because of crossover operations performed during evolution, the resulting programs can grow quite big. To prevent an unlimited growth of the program trees , we mark programs with more than 50 nodes as invalid.
Now we are ready to put everything together:
public static void main(final String[] args) {
final Engine<ProgramGene<Double>, Double> engine = Engine
.builder(Polynomial::error, CODEC)
.minimizing()
.alterers(
new SingleNodeCrossover<>(),
new Mutator<>())
.build();
final ProgramGene<Double> program = engine.stream()
.limit(500)
.collect(EvolutionResult.toBestGenotype())
.getGene();
System.out.println(Tree.toString(program));
}
The GP is capable of finding the polynomial which created the sample data. After a few tries, we got the following (correct) output program:
add ├── mul │ ├── x │ └── sub │ ├── 0.0 │ └── mul │ ├── x │ └── sub │ ├── sub │ │ ├── sub │ │ │ ├── sub │ │ │ │ ├── 3.0 │ │ │ │ └── x │ │ │ └── x │ │ └── x │ └── x └── x
This program can be reduced to 4*x^3 - 3*x^2 + x, which is exactly the polynomial, which created the sample data.