Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(py)Brush v1.0: islands! #55

Merged
merged 211 commits into from
Jun 11, 2024
Merged

(py)Brush v1.0: islands! #55

merged 211 commits into from
Jun 11, 2024

Conversation

gAldeia
Copy link
Collaborator

@gAldeia gAldeia commented Apr 25, 2024

(py)Brush v1.0: islands!


What's new

Essentially, this PR implements all the evolutionary steps in C++, conveniently wrapped into a scikit estimator. C++ implementation uses task flow to manage several islands into different threads.

There are several new classes, and I have restructured the source code.

I named the C++ implementation Brush, while the Python library is called Pybrush.

How is designed

The main entry point is what I call the Brush Engine, which will do all the work to get the job done. The Engine is configured through a struct called Parameters, which contains all hyper-parameters for the EA. The Dataset class handles every operation regarding the data (splitting train and test partitions, inferring data types, etc.). You can run the Engine using a pre-constructed Dataset instance, or you can conveniently call fit(X, y), and it will try to create the dataset for you using some configurations you may specify in the Parameters class.

Prototyping with Brush

There is still compatibility with DEAP to prototype different evolutionary algorithms. In fact, this compatibility is extended by creating binders to all classes from Brush: Evaluator, Selector, Population, Individual, etc. Brush implements hashes for the individual and fitness classes in C++, making it possible to use the DEAP toolbox. The old NSGA2 implemented using the DEAP API is still there, and in the future, I hope I can create a notebook explaining how to prototype with Brush.

Final remarks

While there is much work to do, I think it is already time to move all these significant changes to the master branch. We now have a fully functional evolutionary algorithm implementation in C++ with taskflow to handle parallelism. There is also a convenient wrapper for this implementation, and we are still compatible with DEAP. There are many TODOs written all over the place, and I intend to work on these in the next weeks.

gAldeia added 30 commits August 8, 2023 15:43
Im not good at merging stuff
This commit changes how mutation works by selecting the spot
**after** selecting the mutation. Previously, spot was being selected
without taking into account which mutation was going to happen.

Now, mutations are derivations of a baseclass that implements
at least two methods: find_spots and mutate.

find_spots will return the weights of nodes that can be selected
to apply the mutation. The idea here is to have more robust mutation,
avoiding a lot of nullopt returns.

mutate will actually change the expression, based on a given spot.

Now, the mutation function takes care of selecting the node
and performing the checks to determine if the search space holds
an alternative to apply the mutation. Oh, it also tak care of
writing the mutation trace (which I'm thinking about making it
an official feature, instead of just a debuging tool).
Now we can make predictions with a single sample, and also
without having to specify the feature names used in training.

This changes are made since brush's search space and dispatch
table use these information to evaluate an expression tree.
This is aligned with standard scikit interface for predict proba
Copy link
Member

@lacava lacava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly minor comments. thanks a lot guilherme! i will merge once i hear back about the questions.

docs/examples/datasets/d_analcatdata_happiness.csv Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally BrushEstimator and DeapEstimator which have a lot of overlap would share some structure but this is fine for now. at some point they should both inherit from a class that shares their parameters / functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good idea. It will help me develop parallel classes without worrying about changing one but not the other.
I am going to create a Python base class (something like EstimatorInterface) and make them inherit from it. Does that sound good?

pybrush/_versionstr.py Outdated Show resolved Hide resolved
pybrush/__init__.py Outdated Show resolved Hide resolved
src/engine.h Outdated Show resolved Hide resolved
src/eval/evaluation.cpp Outdated Show resolved Hide resolved
src/eval/metrics.cpp Show resolved Hide resolved
src/ind/fitness.h Outdated Show resolved Hide resolved
src/ind/individual.h Outdated Show resolved Hide resolved
@lacava
Copy link
Member

lacava commented May 31, 2024

@gAldeia checking back on this merge

@gAldeia
Copy link
Collaborator Author

gAldeia commented Jun 11, 2024

@lacava

I finished working on your comments. There is documentation for many of the new things I implemented in this PR, mostly done with the help of GitHub Copilot (I was exploring its capabilities, and it turns out it can write documentation—sometimes). I also cleaned up a lot of TODOs that I left in the code. These TODOs also help make it easier to implement the MABs, which I'll be working on now.

Some new things:

These additions were staged locally in my machine, and while I was cleaning some TODOs, I decided to include them in the PR as well.

@lacava lacava merged commit ee44ac3 into master Jun 11, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants