-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add StandardState #32
Conversation
Co-authored-by: benwandrew <[email protected]>
Co-authored-by: benwandrew <[email protected]>
…state # Conflicts: # src/autora/state/delta.py
# Conflicts: # src/autora/state/delta.py
…es new each time through the field cycle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is good, cool stuff! my only hesitation so far regarding names is models
vs model
(and more generally, things (full list) vs thing (last element); can we somehow more clearly differentiate the last model and the List?
>>> (s + dm1 + dm2).models | ||
[DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)] | ||
|
||
The last model is available under the `model` property: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if want a clearer distinction between the models
and model
properties? could get a little confusing for users since they're so close in name... although, admittedly hard to think of an alternative that keeps single words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... how about model
vs. model_list
or model_set
? Are those better than models
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@musslick i'm fine with either of those! i'm also fine with models
and model
if others think it's not actually going to be an issue; just trying to maximally clear while still being Pythonic :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I feel like I still prefer models
rather than anything longer – it feels a bit more natural to me. Combined with the type annotations (like BaseEstimator
vs List[BaseEstimator]
) and type checking, I feel like the risk of accidental confusion is sufficiently small to be unproblematic. I'm open to persuasion there though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, i'm happy with models
! it was only a slight hesitation on my part, and i think it is indeed quite natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just had a question regarding numpy<>pandas conversion.
... metadata={"delta": "replace", | ||
... "converter": np.asarray}) | ||
|
||
Here we pass a dataframe, but expect a numpy array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: Do we also allow for casting from pandas to numpy? I think we should allow that since most people will be feeding the theorists with numpy array (it's just simpler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Casting from pandas to numpy is totally possible – using the np.asarray
converter like on line 170, if the Delta includes a DataFrame, it will be converted to a numpy array.
You can always put a pd.DataFrame
through np.asarray
to get a numpy array.
My recommendation would be to have the functions and classes which want to represent the data internally as numpy arrays accept np.typing.ArrayLike
as the input type, which allows for lots of input types – lists of values, or DataFrames
or np.ndarrays
– and then do a np.asarray()
call at the top of the function to turn the input into the array you want.
It's really hard to do the casting properly from outside these functions, as we'd have to inspect the function signatures and work out whether the type we have is compatible. It's much simpler and more efficient to do this within the function itself, especially if we have a general interchange format like pd.DataFrame as our standard.
>>> (s + dm1 + dm2).models | ||
[DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)] | ||
|
||
The last model is available under the `model` property: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... how about model
vs. model_list
or model_set
? Are those better than models
?
Co-authored-by: benwandrew <[email protected]>
…to feat/default-state-from-main
They were. Thanks for catching this! It should be in order now. Using generators is tricky, perhaps it's not something we should really recommend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the end of our conversation in group today, and with all previous requested changes/questions addressed, i'm happy to get this merged!
>>> (s + dm1 + dm2).models | ||
[DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)] | ||
|
||
The last model is available under the `model` property: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, i'm happy with models
! it was only a slight hesitation on my part, and i think it is indeed quite natural.
ci: update nb-clean hook to remove empty cells
Description
Add a new
StandardState
object which can be used as the default for future experimentalists, experiment runners and theorists.Type of change
Features
Add a new
StandardState
object withvariables
experiment_data
conditions
models
model
which returns the last model.Make some fixes and refactors in the autora.state.delta module.
State
to coerce datatypes to the correct type with themetadata["converter"]
parameter, supersedes feat: add ability to coerce datatypes in the state object on a Delta update #27State
to declare aliases which convert a particularDelta
parameter into a format understood by theState
, supersedes feat: add "aliases" to State fields #31Questions (Optional)
Details
Aliases
Aliases work like this:
...and get the following back as
t
:This is required for our
Theorist
interface which may by default return a single model:Without this feature, this couldn't be handled without requiring each theorist to always return a list of models, which is a pain and feels wrong:
return Result(models=[model])