Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: update experimentalists grid_pool, random_pool, random_sample to use the new State mechanism #33

Merged
merged 132 commits into from
Aug 25, 2023

Conversation

hollandjg
Copy link
Member

@hollandjg hollandjg commented Jul 12, 2023

Updates:

  • Breaking Change, Refactor: remove structural pooler vs. sampler split in autora.experimentalist
  • Breaking Change, Feature: Update standard way of setting up experimentalists so that every function should be wrapped by the end-user, reducing load on contributors and hopefully making reuse easier
  • Breaking Change, Refactor: Gather standard State and wrapper functions in a single submodule to allow from autora.state.standard import StandardState, Delta, on_state, state_fn_from_estimator
  • Breaking Change, Feature: convert existing example experimentalists grid_pool, random_pool, and random_sample to use the State + Delta mechanism with the standard field names from StandardState
  • CI: Update Pre-Commit Hooks

Resolutions:

See also:

Discussion – moving poolers/samplers to top-level of experimentalist

There are several reasons for moving the poolers and samplers into files at the top level of experimentalist:

poolers and samplers are now less relevant

The strict difference between poolers and samplers now has less meaning

Conceptually: Now we just have functions which sit in a pipeline or cycle and operate on the state. Some of them still act like poolers and samplers of old, but they look more similar.

Technically: In the old setup, poolers took no positional arguments by design, whereas samplers took one (conditions). Now all of the poolers and samplers might take any number of arguments, like the experiment_data, model, conditions from a step before, and we achieve that using only named arguments in the functions which act on State.

It makes sense to gather related functions in one file

Some of the poolers and samplers are obviously related – like the random pooler/sampler, falsification pooler/sampler.

It is simpler to have these in the same file, so that you can share functions more easily between them.

This will also remove one level of complexity for contributors – they'll just be able to make an experimentalist with full freedom to use whatever they want.

This will also remove one chunk of complexity from the cookiecutter template.

Allows for backward compatibility whilst freeing up the good names we want

There are two things we really want to achieve here:

  • no breaking changes in this release
  • free up names like grid_pool, random_sample, random_pool for our new paradigm

The proposed method was to make the existing functions into singledispatch functions which had different behavior depending on what the inputs were. However, this relies on there always being at least one positional argument to the function, an assumption which is broken for the old poolers which had no positional arguments by design.

An attempt was made to remedy this, but it was super complicated and didn't work because of the positional arguments thing.

Both in this case, and in future cases where we'll want to update all of the existing contributions to use the new setup, it's just much easier to start over in a new file than to try to make it work in the same file.

@hollandjg hollandjg changed the base branch from main to feat/default-state-from-main July 12, 2023 14:23
@hollandjg hollandjg changed the title feat/default-state-from-main-experimentalists feat!: rename grid_pool, random_pool, random_sample to use the new naming convention for experimentalists Jul 12, 2023
@hollandjg hollandjg changed the title feat!: rename grid_pool, random_pool, random_sample to use the new naming convention for experimentalists feat!: update grid_pool, random_pool, random_sample to use the new State mechanism Jul 12, 2023
@hollandjg hollandjg changed the title feat!: update grid_pool, random_pool, random_sample to use the new State mechanism feat!: update grid_pool, random_pool, random_sample to use the new State mechanism Jul 12, 2023
@hollandjg hollandjg self-assigned this Jul 12, 2023
@hollandjg
Copy link
Member Author

Hi all, esp @musslick @younesStrittmatter @benwandrew @blinodelka @chadcwilliams @benwandrew
I've updated this PR to include the core changes from the mini-hackathon. I think this might be ready to merge, so please let me know if there's something broken.

If anything is missing, I suggest we make that a new PR, because this one is pretty large and already does a lot of stuff.

  • I didn't include the align_dataframe_to_ivs function from https://github.com/AutoResearch/autora-core/blob/mini-hackaton-before-refactor-state/src/autora/utils/conversion.py because I don't think this should be part of the on_state function.
    • The functions in autora.state.delta are by design completely agnostic about the contents of the state objects, other than assuming that they can be treated as State objects and are fundamentally dataclasses. That means that they don't do any transformations on the data beyond extending DataFrames and renaming things according to the alias rules.
    • I think it's best to include the align_dataframe_to_ivs in the places where you need it in your experimentalist, for instance, or to define a new function wrapper which you can use if you need it.
    • Remember that if you're using data frames you can just pass a list of column names and you've already got exactly those columns in that order, and I think that's probably more understandable to the end-user.

@hollandjg hollandjg marked this pull request as ready for review August 22, 2023 12:33
@hollandjg hollandjg mentioned this pull request Aug 22, 2023
@musslick
Copy link
Contributor

Hi all, esp @musslick @younesStrittmatter @benwandrew @blinodelka @chadcwilliams @benwandrew I've updated this PR to include the core changes from the mini-hackathon. I think this might be ready to merge, so please let me know if there's something broken.

If anything is missing, I suggest we make that a new PR, because this one is pretty large and already does a lot of stuff.

  • I didn't include the align_dataframe_to_ivs function from https://github.com/AutoResearch/autora-core/blob/mini-hackaton-before-refactor-state/src/autora/utils/conversion.py because I don't think this should be part of the on_state function.

    • The functions in autora.state.delta are by design completely agnostic about the contents of the state objects, other than assuming that they can be treated as State objects and are fundamentally dataclasses. That means that they don't do any transformations on the data beyond extending DataFrames and renaming things according to the alias rules.
    • I think it's best to include the align_dataframe_to_ivs in the places where you need it in your experimentalist, for instance, or to define a new function wrapper which you can use if you need it.
    • Remember that if you're using data frames you can just pass a list of column names and you've already got exactly those columns in that order, and I think that's probably more understandable to the end-user.

So far, we have been using it in both experimentalists and synthetic models (autora). Any other place where we can put it in the core?

@hollandjg hollandjg changed the title feat!: update grid_pool, random_pool, random_sample to use the new State mechanism feat!: update experimentalists grid_pool, random_pool, random_sample to use the new State mechanism Aug 23, 2023
@hollandjg
Copy link
Member Author

So far, we have been using it in both experimentalists and synthetic models (autora). Any other place where we can put it in the core?

I can't think of any.

@hollandjg
Copy link
Member Author

Hi all, esp. @musslick @benwandrew and @younesStrittmatter – I'd appreciate your re-review of this PR! Please let me know if there's anything else needing changing.

src/autora/state.py Outdated Show resolved Hide resolved
src/autora/state.py Outdated Show resolved Hide resolved
src/autora/state.py Outdated Show resolved Hide resolved
Copy link
Contributor

@musslick musslick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but docstrings need to be fixed

@hollandjg hollandjg requested a review from musslick August 25, 2023 15:49
Copy link
Contributor

@musslick musslick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@hollandjg hollandjg dismissed benwandrew’s stale review August 25, 2023 17:35

Covered in today's meeting. Thanks, Ben!

@hollandjg hollandjg added this pull request to the merge queue Aug 25, 2023
Merged via the queue into main with commit 2636a27 Aug 25, 2023
@hollandjg hollandjg deleted the feat/default-state-from-main-experimentalists branch August 25, 2023 17:36
hollandjg added a commit that referenced this pull request Nov 29, 2023
test: fix broken tests (random_sample uses num_samples now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants