-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(py)Brush v1.0: islands! #55
Conversation
Initializing weights
Toggle weight on off
Im not good at merging stuff
This commit changes how mutation works by selecting the spot **after** selecting the mutation. Previously, spot was being selected without taking into account which mutation was going to happen. Now, mutations are derivations of a baseclass that implements at least two methods: find_spots and mutate. find_spots will return the weights of nodes that can be selected to apply the mutation. The idea here is to have more robust mutation, avoiding a lot of nullopt returns. mutate will actually change the expression, based on a given spot. Now, the mutation function takes care of selecting the node and performing the checks to determine if the search space holds an alternative to apply the mutation. Oh, it also tak care of writing the mutation trace (which I'm thinking about making it an official feature, instead of just a debuging tool).
Now we can make predictions with a single sample, and also without having to specify the feature names used in training. This changes are made since brush's search space and dispatch table use these information to evaluate an expression tree.
This is aligned with standard scikit interface for predict proba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly minor comments. thanks a lot guilherme! i will merge once i hear back about the questions.
pybrush/BrushEstimator.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally BrushEstimator
and DeapEstimator
which have a lot of overlap would share some structure but this is fine for now. at some point they should both inherit from a class that shares their parameters / functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good idea. It will help me develop parallel classes without worrying about changing one but not the other.
I am going to create a Python base class (something like EstimatorInterface
) and make them inherit from it. Does that sound good?
@gAldeia checking back on this merge |
…s, so we dont have several reimplementations of same function
I finished working on your comments. There is documentation for many of the new things I implemented in this PR, mostly done with the help of GitHub Copilot (I was exploring its capabilities, and it turns out it can write documentation—sometimes). I also cleaned up a lot of TODOs that I left in the code. These TODOs also help make it easier to implement the MABs, which I'll be working on now. Some new things:
These additions were staged locally in my machine, and while I was cleaning some TODOs, I decided to include them in the PR as well. |
(py)Brush v1.0: islands!
What's new
Essentially, this PR implements all the evolutionary steps in C++, conveniently wrapped into a scikit estimator. C++ implementation uses task flow to manage several islands into different threads.
There are several new classes, and I have restructured the source code.
I named the C++ implementation Brush, while the Python library is called Pybrush.
How is designed
The main entry point is what I call the Brush
Engine
, which will do all the work to get the job done. The Engine is configured through a struct calledParameters
, which contains all hyper-parameters for the EA. TheDataset
class handles every operation regarding the data (splitting train and test partitions, inferring data types, etc.). You can run the Engine using a pre-constructed Dataset instance, or you can conveniently callfit(X, y)
, and it will try to create the dataset for you using some configurations you may specify in the Parameters class.Prototyping with Brush
There is still compatibility with DEAP to prototype different evolutionary algorithms. In fact, this compatibility is extended by creating binders to all classes from Brush:
Evaluator
,Selector
,Population
,Individual
, etc. Brush implements hashes for the individual and fitness classes in C++, making it possible to use the DEAP toolbox. The old NSGA2 implemented using the DEAP API is still there, and in the future, I hope I can create a notebook explaining how to prototype with Brush.Final remarks
While there is much work to do, I think it is already time to move all these significant changes to the master branch. We now have a fully functional evolutionary algorithm implementation in C++ with taskflow to handle parallelism. There is also a convenient wrapper for this implementation, and we are still compatible with DEAP. There are many TODOs written all over the place, and I intend to work on these in the next weeks.