Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi armed bandits, solving some issues, and more #59

Open
wants to merge 146 commits into
base: master
Choose a base branch
from

Conversation

gAldeia
Copy link
Collaborator

@gAldeia gAldeia commented Aug 20, 2024

This PR implements multi-armed bandits to learn sampling probabilities for terminals, operators, and variation operations.

It also implements an archive (Issue #58).

A new classification metric, average_precision_score, is also implemented (partially solving Issue #57), which was hard to do, especially because I needed it to work with lexicase selection based on single-test evaluations. Currently, lexicase is still based on log loss, but the survival, archive, and final individuals are picked with the average precision score.

It also implements complexity and linear complexity.

I added a feature to abide by feature types from pandas data frames on the Python side (Issue #56). However, this feature has caused me to notice some undesired conversions. Sometimes, I find myself using X.values to avoid copying datatypes from pandas.

There is a progress bar now (Issue #20) and different verbosity levels.

I fixed the mean label so it is no longer weighted, and there have also been many improvements in the logistic nodes for classification problems.

Some new unit tests (mainly in cpp) were implemented.

Performance is not my main concern right now, and I recognize several new TODOs and improvements, but I will focus on that later.

gAldeia added 30 commits June 14, 2024 15:35
… api

Now brush can work with different metrics, as long as they are implemented in the cpp library. you can set them with the scorer parameter
But we currently only have the dummy bandit (that does nothing)
Copy link
Member

@lacava lacava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be based on the type of the SplitBest child argument. The threshold could be zero for a continuous variable, in which case you don't want to hide it.

@gAldeia
Copy link
Collaborator Author

gAldeia commented Oct 29, 2024

I’ll add a new attribute to the split nodes to store the data type of the split feature. This seems to be the most straightforward way to quickly access the dtype of the split feature.

@lacava
Copy link
Member

lacava commented Oct 29, 2024

the type of the attribute for SplitBest is stored in arg_types[0].

wait, you already have the logic for SplitOn on line 48, can you just repeat that for SplitBest nodes?

    else if (Is<NodeType::SplitOn>(data.node_type)){
        if (data.arg_types.at(0) == DataType::ArrayB)
        {
            // booleans dont use thresholds (they are used directly as mask in split)
            return "If" + child_outputs;
        }

@lacava
Copy link
Member

lacava commented Oct 29, 2024

FYI there are three signatures for SplitBest nodes that are defined here, so you just have to look at the first arg type:

using type = std::tuple<

… fit

Also fixed a python interface problem when setting the scorer for regression problems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants