Multi armed bandits, solving some issues, and more #59

gAldeia · 2024-08-20T18:30:34Z

This PR implements multi-armed bandits to learn sampling probabilities for terminals, operators, and variation operations.

It also implements an archive (Issue #58).

A new classification metric, average_precision_score, is also implemented (partially solving Issue #57), which was hard to do, especially because I needed it to work with lexicase selection based on single-test evaluations. Currently, lexicase is still based on log loss, but the survival, archive, and final individuals are picked with the average precision score.

It also implements complexity and linear complexity.

I added a feature to abide by feature types from pandas data frames on the Python side (Issue #56). However, this feature has caused me to notice some undesired conversions. Sometimes, I find myself using X.values to avoid copying datatypes from pandas.

There is a progress bar now (Issue #20) and different verbosity levels.

I fixed the mean label so it is no longer weighted, and there have also been many improvements in the logistic nodes for classification problems.

Some new unit tests (mainly in cpp) were implemented.

Performance is not my main concern right now, and I recognize several new TODOs and improvements, but I will focus on that later.

… api Now brush can work with different metrics, as long as they are implemented in the cpp library. you can set them with the scorer parameter

But we currently only have the dummy bandit (that does nothing)

…est cases for bandits. Boost as dependence

…aces

…ions

…h development

…_populations

lacava

it should be based on the type of the SplitBest child argument. The threshold could be zero for a continuous variable, in which case you don't want to hide it.

gAldeia · 2024-10-29T19:39:02Z

I’ll add a new attribute to the split nodes to store the data type of the split feature. This seems to be the most straightforward way to quickly access the dtype of the split feature.

lacava · 2024-10-29T20:41:20Z

the type of the attribute for SplitBest is stored in arg_types[0].

wait, you already have the logic for SplitOn on line 48, can you just repeat that for SplitBest nodes?

    else if (Is<NodeType::SplitOn>(data.node_type)){
        if (data.arg_types.at(0) == DataType::ArrayB)
        {
            // booleans dont use thresholds (they are used directly as mask in split)
            return "If" + child_outputs;
        }

lacava · 2024-10-29T20:43:54Z

FYI there are three signatures for SplitBest nodes that are defined here, so you just have to look at the first arg type:

brush/src/program/signatures.h

Line 354 in ee44ac3

using type = std::tuple<

…ion of constants

…ster now

…idation

… fit Also fixed a python interface problem when setting the scorer for regression problems

…book

gAldeia added 30 commits June 14, 2024 15:35

Fixed core dump in average_precision_score. exposing scorer to python…

16392eb

… api Now brush can work with different metrics, as long as they are implemented in the cpp library. you can set them with the scorer parameter

Shuffle validation split option. data types as argument in Dataset

937a710

Daniel now has a github account!

a9ea589

Storing parent fitness for making it easy to use MABs

4f161c1

implemented structure for having bandits in brush

5bb2a2d

Bug fix when writing to rewards. Implemented calc_reward

0ed4b78

Templating bandits

21dd880

Updated how bandits work. Finished implementing MAB for variation

fb61282

But we currently only have the dummy bandit (that does nothing)

Fixed missing symbols in python wrapper

013cfaf

Improved bandit class. starting to implement thompson sampling. new t…

a1bc0d0

…est cases for bandits. Boost as dependence

weights_init as parameter. changing how we initialize engine

b2ac374

Bandit is now working, but needs improvements. Changed several interf…

517da75

…aces

updated cpp tests

6245f53

Fixed average_precision score not working properly

e4e8473

Fixed crash when logistic was not in functions

dd808ac

updated notebookx

01f3ff3

Bandits notebook

e508cbc

linear complexity objective (alternative to the recursive version)

004ff7a

fixed incomplete instruction

4bb9527

Updated example to show how to access size, depth, and complexity

c344fca

Fixed boost-cpp version (newer version is breaking)

5ed2fc3

interface to select parent selection and pop survival methods

3194b59

updated guide to show how to access archive in saving_loading_populat…

675ce91

…ions

Install fail was actually due to gcc version. fixed it until we finis…

0d0d58a

…h development

Removed sufix added while testing install locally

c102bb6

Option to print the search space or get a string with the output

03476d8

Using shuffle split. Better print for offset

d003203

OffsetSum starts with zero offset

9368414

MeanLabel uses the most frequent y value in classification tasks

a2c9f4f

Implemented thompson sampling correctly

5d2f057

gAldeia added 3 commits October 23, 2024 11:17

Updated outputs. Example on how to save checkpoints on saving_loading…

af44cec

…_populations

Improved get_model("tree") and node labels

82995a2

Hiding threshold if equals to zero in tree representation

10a4caa

lacava reviewed Oct 29, 2024

View reviewed changes

gAldeia added 23 commits October 30, 2024 13:46

Checking arg dtype for pretty print of split best operators on trees

9243121

MeanLabel only for classification problems. Better weight initializat…

93e0a45

…ion of constants

Updated notebook outputs

f7505ef

Fixed validation data leakage on lexicase selection

e2728b9

Brush now works with max_gens=0. Improved c++ operations, slightly fa…

e68be7a

…ster now

Serializing population

10af3c1

Improving display of split best when splitting on boolean features

2fd7c39

Class weights for classification tasks

2d2aada

Ensuring the array starts empty in e-lexicase cases

6143ef0

Loaded population should be initial point as it is. stop reseting

b392994

Renamed objective error to scorer

3f46f2a

Updated default values for size and depth

b0b485d

Class weights being initialized with only training partition, not val…

2451619

…idation

Fixed sample weights not working in lexicase

691aa04

Updated class weight calculation

650b42a

Updated log loss to behave like sklearns implementation with weights

37124ec

Updated class weights test

9796ee1

operator[] for population; partial_fit() for BrushEstimators

fc524bf

Fixed potential data leak: validation partition was being used during…

c130602

… fit Also fixed a python interface problem when setting the scorer for regression problems

Last commit fixed null loss caused by inf/nan during fit. update note…

9deb673

…book

New approach to use predictions as context

4efb83d

Updated deap estimator with new Variator interface

3f3bdbb

Interactive pareto plots

e08fb6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi armed bandits, solving some issues, and more #59

Multi armed bandits, solving some issues, and more #59

gAldeia commented Aug 20, 2024

lacava left a comment

gAldeia commented Oct 29, 2024

lacava commented Oct 29, 2024 •

edited

Loading

lacava commented Oct 29, 2024

Multi armed bandits, solving some issues, and more #59

Are you sure you want to change the base?

Multi armed bandits, solving some issues, and more #59

Conversation

gAldeia commented Aug 20, 2024

lacava left a comment

Choose a reason for hiding this comment

gAldeia commented Oct 29, 2024

lacava commented Oct 29, 2024 • edited Loading

lacava commented Oct 29, 2024

lacava commented Oct 29, 2024 •

edited

Loading