(py)Brush v1.0: islands! #55

gAldeia · 2024-04-25T18:45:58Z

(py)Brush v1.0: islands!

What's new

Essentially, this PR implements all the evolutionary steps in C++, conveniently wrapped into a scikit estimator. C++ implementation uses task flow to manage several islands into different threads.

There are several new classes, and I have restructured the source code.

I named the C++ implementation Brush, while the Python library is called Pybrush.

How is designed

The main entry point is what I call the Brush Engine, which will do all the work to get the job done. The Engine is configured through a struct called Parameters, which contains all hyper-parameters for the EA. The Dataset class handles every operation regarding the data (splitting train and test partitions, inferring data types, etc.). You can run the Engine using a pre-constructed Dataset instance, or you can conveniently call fit(X, y), and it will try to create the dataset for you using some configurations you may specify in the Parameters class.

Prototyping with Brush

There is still compatibility with DEAP to prototype different evolutionary algorithms. In fact, this compatibility is extended by creating binders to all classes from Brush: Evaluator, Selector, Population, Individual, etc. Brush implements hashes for the individual and fitness classes in C++, making it possible to use the DEAP toolbox. The old NSGA2 implemented using the DEAP API is still there, and in the future, I hope I can create a notebook explaining how to prototype with Brush.

Final remarks

While there is much work to do, I think it is already time to move all these significant changes to the master branch. We now have a fully functional evolutionary algorithm implementation in C++ with taskflow to handle parallelism. There is also a convenient wrapper for this implementation, and we are still compatible with DEAP. There are many TODOs written all over the place, and I intend to work on these in the next weeks.

Initializing weights

Toggle weight on off

Im not good at merging stuff

This commit changes how mutation works by selecting the spot **after** selecting the mutation. Previously, spot was being selected without taking into account which mutation was going to happen. Now, mutations are derivations of a baseclass that implements at least two methods: find_spots and mutate. find_spots will return the weights of nodes that can be selected to apply the mutation. The idea here is to have more robust mutation, avoiding a lot of nullopt returns. mutate will actually change the expression, based on a given spot. Now, the mutation function takes care of selecting the node and performing the checks to determine if the search space holds an alternative to apply the mutation. Oh, it also tak care of writing the mutation trace (which I'm thinking about making it an official feature, instead of just a debuging tool).

Now we can make predictions with a single sample, and also without having to specify the feature names used in training. This changes are made since brush's search space and dispatch table use these information to evaluate an expression tree.

This is aligned with standard scikit interface for predict proba

…ference dataset

lacava

mostly minor comments. thanks a lot guilherme! i will merge once i hear back about the questions.

docs/examples/datasets/d_analcatdata_happiness.csv

README.md

lacava · 2024-05-02T12:50:34Z

pybrush/BrushEstimator.py

ideally BrushEstimator and DeapEstimator which have a lot of overlap would share some structure but this is fine for now. at some point they should both inherit from a class that shares their parameters / functions.

This is actually a good idea. It will help me develop parallel classes without worrying about changing one but not the other.
I am going to create a Python base class (something like EstimatorInterface) and make them inherit from it. Does that sound good?

pybrush/_versionstr.py

pybrush/__init__.py

src/engine.h

src/eval/evaluation.cpp

src/eval/metrics.cpp

src/ind/fitness.h

src/ind/individual.h

lacava · 2024-05-31T13:31:04Z

@gAldeia checking back on this merge

…s, so we dont have several reimplementations of same function

…thods)

gAldeia · 2024-06-11T02:01:55Z

@lacava

I finished working on your comments. There is documentation for many of the new things I implemented in this PR, mostly done with the help of GitHub Copilot (I was exploring its capabilities, and it turns out it can write documentation—sometimes). I also cleaned up a lot of TODOs that I left in the code. These TODOs also help make it easier to implement the MABs, which I'll be working on now.

Some new things:

Saving/loading population;
Archive and predict with archive;
New example notebooks https://github.com/cavalab/brush/tree/island_GA/docs/guide;
Average precision score metric for binary classification.

These additions were staged locally in my machine, and while I was cleaning some TODOs, I decided to include them in the PR as well.

gAldeia added 30 commits August 8, 2023 15:43

Merge pull request #45 from cavalab/initializing_weights

6159a28

Initializing weights

Regressor now uses MSE (instead of squashed version of the metric)

ddc280c

Fixed wrong use of validation partition and MDCM

9939d07

Merge pull request #48 from cavalab/toggle_weight_on_off

38c3c6a

Toggle weight on off

Added mutation trace back

dc328bb

Im not good at merging stuff

Uniform weight initialization between mutation options and cx

c6ccde4

If mutation/cx fails, then the parent is inserted in offspring

f29d94d

Marked get_model functions as const

9d1d153

Additional check before doing PTC2

29ae79e

Updated tests

1e35e2d

Switched append(value) to extend([value])

2aac8b0

Better way to make sure PTC2, make_program and subtree will work

d6e9778

Improved find_spot and removed template in base class

9044d5a

Printing more informations for debug

1082720

Bug fixes. Avoiding re-fit. cloning expressions

cb8e3ec

Simple interface to clone a program and return a copy

732a579

Testing clone method

20ef18d

Implementation of simple GA with tournament selection

e9424a7

Initial structure to implement an Island GA. taskflow added as depend.

c1e4366

Changed GA so it doesnt require a new file

058579b

Fixed include of non-existing file

099d9bf

Changed get_params to have same arguments as base class from sklearn

f238ebd

Fixed bug fit with a dataframe but predict with an np.array

91ba367

Predict now use types from training data

9b2d716

Now we can make predictions with a single sample, and also without having to specify the feature names used in training. This changes are made since brush's search space and dispatch table use these information to evaluate an expression tree.

predict_proba returns 2d array for binary classification

d4324a0

This is aligned with standard scikit interface for predict proba

Fixed wrong comparison when throwing an error while copying from a re…

52611bb

…ference dataset

Bug fix - binary clf programs being created without logistic as root

f42590d

New fix. Some classification programs still being modified

9a6c7f1

Improved counting nodes

74f9d3a

lacava reviewed May 2, 2024

View reviewed changes

gAldeia added 2 commits May 3, 2024 18:53

Archive implementation. Individual ids. New TODOs to solve

12a8772

fixed missing val_from_arch parameter and pybrush breaking

78314fc

gAldeia added 24 commits June 5, 2024 10:35

Updated examples in readme

7ea90c2

removing unused file

4381134

cleaning a lot of comments and TODOs

cd9436c

adding some new TODOs

30e7bad

Instructions to build docs locally

29afbe7

Updated example notebooks and started writing new ones

62aff7a

Fixed bug when passing a list of functions

f036a2d

more cleaning. Implemented string representation for fitness

9375c88

Final adjustments to save/load pop and use archive

cbcfcbe

Fix print statement when using archive

e021f13

Removed useless lambda function

17de2b0

Spacing

6fd411d

new example notebooks!!

3112a18

Added new files. Now it is time to write documentation

6be28de

Todo: version for docs should be set automatically

452ad67

Implemented average_precision_score metric in cpp

3506898

Fixed python docs not being found

6af5ce0

Documentation. setting n_classes in cpp side

d675707

Documentation

ae0a886

Improved variation. Fixed lots of TODOs. implemented get_size in tree…

e1b2f19

…s, so we dont have several reimplementations of same function

Updated tedts to work with new variation

dd1eb30

Fixed bad doxygen instructions

f5d26eb

lots of docs

5210dd7

Documentation for Engine class (just the class definition, not its me…

f70d32e

…thods)

lacava merged commit ee44ac3 into master Jun 11, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(py)Brush v1.0: islands! #55

(py)Brush v1.0: islands! #55

gAldeia commented Apr 25, 2024

lacava left a comment

lacava May 2, 2024

gAldeia Jun 4, 2024

lacava commented May 31, 2024

gAldeia commented Jun 11, 2024

(py)Brush v1.0: islands! #55

(py)Brush v1.0: islands! #55

Conversation

gAldeia commented Apr 25, 2024

(py)Brush v1.0: islands!

What's new

How is designed

Prototyping with Brush

Final remarks

lacava left a comment

Choose a reason for hiding this comment

lacava May 2, 2024

Choose a reason for hiding this comment

gAldeia Jun 4, 2024

Choose a reason for hiding this comment

lacava commented May 31, 2024

gAldeia commented Jun 11, 2024