Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify arg default #512

Merged
merged 13 commits into from
Oct 14, 2023
Merged

Modify arg default #512

merged 13 commits into from
Oct 14, 2023

Conversation

PhilippeMoussalli
Copy link
Contributor

PR that replaces default data types to None

@PhilippeMoussalli PhilippeMoussalli linked an issue Oct 11, 2023 that may be closed by this pull request
@@ -57,6 +58,9 @@ def load(self) -> dd.DataFrame:
(subset_field_name, subset_field_name)
columns.append(column_name)

if self.index_column is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to keep the index column that will be dropped later on (if it exists). We weren't picking up on it since it was not in the component spec or remapping dict

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f" to maximize worker usage",
)

elif self.input_partition_rows > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif self.input_partition_rows > 1:
elif self.input_partition_rows >= 1:

src/fondant/component_spec.py Show resolved Hide resolved
@@ -47,7 +47,8 @@ def python_type(self) -> t.Any:
"dict": json.loads,
"list": json.loads,
}
return lookup[self.type]
map_fn = lookup[self.type]
return lambda value: map_fn(value) if value != "None" else None # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I see that we only use this to register the arguments. Setting the type as None feels strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a bit confusing so i'll recap a bit:

  • In kubeflow optional types that default to None are defined as optional in the spec with no constant runtime value. See input3 here and here. They also should not be passed to the componentOp so that's why we remove them here
  • In docker, all arguments are passed as strings. Above we're defining a map function above a type that converts back to their original type. In case that value is a None string, we're converting it back to a Nonetype. We actually has this in the v1 implementation, not sure why it was removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was a bit confused by the notation, but looking at it again, it's clear to me why this is needed. Thanks!

@@ -19,11 +19,11 @@ args:
description: Optional argument, a list containing the original image column names in case the
dataset on the hub contains them. Used to format the image from HF hub format to a byte string.
type: list
default: []
default: None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a list, [] makes more sense than None.

Also in the other components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree, then by definition it's either:

  • a normal argument with [] as default and no longer optional
  • Optional argument with [] as default which is not really common since Optional types don't really mix well with mutable data types
    image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a normal argument with [] as default and no longer optional

If a default is defined, it is optional. It's just not KFP's isOptional. I don't see any issue with this.

  • Optional argument with [] as default which is not really common since Optional types don't really mix well with mutable data types
    image

In Python this is an issue indeed, however we don't need to define the default in Python. We can just define the default in the fondant_component.yaml:

fondant_component.yaml

    default: []

main.py

    image_column_names: list,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just trying to indicate the absence of a value which is what None is better suited for. The empty list in this case won't be used for appending or modifying a certain behavior. Is there any added advantage compared to None?

It also seems like we're making an arbitrary choice on which data types to define as having empty values. Should we also include dictionaries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage is that you don't need to handle the None case in the code, and can always assume that the value is the of the type defined in the argument.

I would indeed include dictionaries as well. I don't think that's arbitrary.

This is btw just component implementation. Fondant supports None for list and dict as well. I just don't think we need to use it 😛

arg_type_dict["isOptional"] = True
if arg.default is not None:
if arg.default is not None and arg.default != "None":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Shouldn't this PR make sure that "None" is no longer needed, but None can be used instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it's not needed here indeed since this should be triggered at compilation

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PhilippeMoussalli!

As mentioned, I would still change the defaults for collections in our own components to empty ones. But if we do or don't, the fondant functionality looks good to me!

@PhilippeMoussalli
Copy link
Contributor Author

Thanks @PhilippeMoussalli!

As mentioned, I would still change the defaults for collections in our own components to empty ones. But if we do or don't, the fondant functionality looks good to me!

Thanks Robbe, I reverted the collections to their empty representations. So just to recap both arguments with defaults and arguments with default as None (IsOptional by kfp) are now marked at Optional in the python function without a defined data type. This makes it easy to have a single source of truth for the defaults which is in the component spec. I also reformatted both load_from_parquet and load_from_hub component since they were getting cluttered.

Could you have one last look please?

@RobbeSneyders RobbeSneyders merged commit 15f875e into main Oct 14, 2023
4 checks passed
@RobbeSneyders RobbeSneyders deleted the modify-arg-default branch October 14, 2023 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find a way to handle args with combined datatypes
2 participants