-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify arg default #512
Modify arg default #512
Conversation
@@ -57,6 +58,9 @@ def load(self) -> dd.DataFrame: | |||
(subset_field_name, subset_field_name) | |||
columns.append(column_name) | |||
|
|||
if self.index_column is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to keep the index column that will be dropped later on (if it exists). We weren't picking up on it since it was not in the component spec or remapping dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli!
src/fondant/data_io.py
Outdated
f" to maximize worker usage", | ||
) | ||
|
||
elif self.input_partition_rows > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif self.input_partition_rows > 1: | |
elif self.input_partition_rows >= 1: |
@@ -47,7 +47,8 @@ def python_type(self) -> t.Any: | |||
"dict": json.loads, | |||
"list": json.loads, | |||
} | |||
return lookup[self.type] | |||
map_fn = lookup[self.type] | |||
return lambda value: map_fn(value) if value != "None" else None # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I see that we only use this to register the arguments. Setting the type as None feels strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a bit confusing so i'll recap a bit:
- In kubeflow optional types that default to
None
are defined as optional in the spec with noconstant
runtime value. Seeinput3
here and here. They also should not be passed to the componentOp so that's why we remove them here - In docker, all arguments are passed as strings. Above we're defining a map function above a type that converts back to their original type. In case that value is a
None
string, we're converting it back to aNonetype
. We actually has this in the v1 implementation, not sure why it was removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was a bit confused by the notation, but looking at it again, it's clear to me why this is needed. Thanks!
@@ -19,11 +19,11 @@ args: | |||
description: Optional argument, a list containing the original image column names in case the | |||
dataset on the hub contains them. Used to format the image from HF hub format to a byte string. | |||
type: list | |||
default: [] | |||
default: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a list, []
makes more sense than None
.
Also in the other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- a normal argument with [] as default and no longer optional
If a default is defined, it is optional. It's just not KFP's isOptional
. I don't see any issue with this.
In Python this is an issue indeed, however we don't need to define the default in Python. We can just define the default in the fondant_component.yaml
:
fondant_component.yaml
default: []
main.py
image_column_names: list,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just trying to indicate the absence of a value which is what None
is better suited for. The empty list in this case won't be used for appending or modifying a certain behavior. Is there any added advantage compared to None
?
It also seems like we're making an arbitrary choice on which data types to define as having empty values. Should we also include dictionaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage is that you don't need to handle the None
case in the code, and can always assume that the value is the of the type defined in the argument.
I would indeed include dictionaries as well. I don't think that's arbitrary.
This is btw just component implementation. Fondant supports None
for list
and dict
as well. I just don't think we need to use it 😛
src/fondant/component_spec.py
Outdated
arg_type_dict["isOptional"] = True | ||
if arg.default is not None: | ||
if arg.default is not None and arg.default != "None": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Shouldn't this PR make sure that "None"
is no longer needed, but None
can be used instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it's not needed here indeed since this should be triggered at compilation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli!
As mentioned, I would still change the defaults for collections in our own components to empty ones. But if we do or don't, the fondant functionality looks good to me!
Thanks Robbe, I reverted the collections to their empty representations. So just to recap both arguments with defaults and arguments with default as Could you have one last look please? |
PR that replaces default data types to
None