Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to KfpV2 #477

Merged
merged 34 commits into from
Oct 11, 2023
Merged

Migrate to KfpV2 #477

merged 34 commits into from
Oct 11, 2023

Conversation

GeorgesLorre
Copy link
Collaborator

No description provided.

@GeorgesLorre GeorgesLorre changed the title Feature/kfp v2 KfpV2 and vertex Sep 29, 2023
@GeorgesLorre GeorgesLorre changed the title KfpV2 and vertex KfpV2 and Vertex Sep 29, 2023
@GeorgesLorre GeorgesLorre force-pushed the feature/kfp-v2 branch 4 times, most recently from ae4d1c2 to 146e728 Compare October 2, 2023 13:49
GeorgesLorre and others added 9 commits October 2, 2023 16:30
PR that adds option to set hardware accelerators to KFP V2 (nodepool

Notes:
* Each component was previously configured as a sub-pipeline because of
an error in detecting a certain field in the V2 spec
(`exec-<sanitizied_component_name`) which forces to kubeflow to compile
component to a pipeline spec instead of a component spec
[link](https://github.com/kubeflow/pipelines/blob/ad058b5321f5fe74bbd10845778e6627fb9aa5cb/sdk/python/kfp/dsl/structures.py#L462).
This was resolved by adding the same custom method used by KFP to format
the component name to Fondant.

This makes it possible to assign resources to each component since KFP
does not allow assigning resources on a pipeline level
* Setting up and running nodepools is still to be tested on native KFP
PR that fixes the v2 default /optional types. KFP v2 has two different
optional types:

- Optional values that default to None -> defined by setting
`isOptional` to True and not passing the default value
- Optional values that default to another values -> defined by setting
the value in `defaultValue` and having `isOptional` False or empty.

This PR adjusts the componentSpec accordingly. Another nice win is that
we do not have to define bools/dicts/lists as strings in the component
spec

Few things no note:

* Due to strict type. Mixing different data types is not allowed -> we
had to change the type of `input_partition_rows` to int. Which means it
does not accept the value `disable` anymore. Need to fix this in a
seperate PR
* 'inf` as float is not allowed
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre.

Mostly looks good to me! Having a bit of a hard time understanding some of the argument parsing / converting, but the argument usage in the tests seems logical.

Left some comments.

examples/pipelines/datacomp/pipeline.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
.github/workflows/pipeline.yaml Show resolved Hide resolved
src/fondant/executor.py Show resolved Hide resolved
src/fondant/compiler.py Show resolved Hide resolved
src/fondant/component_spec.py Outdated Show resolved Hide resolved
) -> "KubeflowComponentSpec":
"""Create a Kubeflow component spec from a Fondant component spec."""
def from_fondant_component_spec(cls, fondant_component: ComponentSpec):
"""Generate a Kubeflow component spec from a ComponentOp."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Generate a Kubeflow component spec from a ComponentOp."""
"""Generate a Kubeflow component spec from a Fondant component spec."""

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Robbe! Left a few comments. Can we maybe merge this one into another branch and then tackle all other issues on other branches or would you prefer to merge directly to main?

n_rows_to_load:
description: Optional argument that defines the number of rows to load. Useful for testing pipeline runs on a small scale
type: int
default: None
default: -1
index_column:
description: Column to set index to in the load component, if not specified a default globally unique index will be set
type: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to later on revert back to None and handle the conversion internally? It could be something like setting an arbitrary value if it's None in the fondant component and parsing it back to None again when the component is run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's such a bad thing that the default needs to match the argument type. We would also need an arbitrary value for each type to make this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we would assign a value as an optional argument if the default is set to None. Now the notion of optional arguments is missing and optional arguments are checked implicitly against arbitrary values defined in the default which can be a bit confusing.

Was more thinking from the perspective of the end user. So if they would assign a default to None, we would handle it as optional and return it as None to the end user. Implementing this is still not very clear to me but we could just assign arbitrary known values. For example, if the users sets an list argument to None, we would pass it as [] and the convert it back to None during parsing

image_column_names: t.Optional[list],
n_rows_to_load: t.Optional[int],
index_column: t.Optional[str],
image_column_names: list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep them as optional since they have a default or does optional only apply when the default is None by convention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional means that a None value can be passed in or set as a default. This is no longer the case now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should have optional argument (see comment above)

docs/components/component_spec.md Show resolved Hide resolved
examples/pipelines/datacomp/pipeline.py Outdated Show resolved Hide resolved
examples/pipelines/datacomp/pipeline.py Show resolved Hide resolved
examples/pipelines/datacomp/pipeline.py Show resolved Hide resolved
# entry: python scripts/component_readme/generate_readme.py
# files: ^components/.*/fondant_component.yaml
# additional_dependencies: ["fondant"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to deactivate this since it installs the latest version of fondant on PyPI, which breaks on the changes we made to the component spec schema. We can re-enable this later.

@RobbeSneyders RobbeSneyders marked this pull request as ready for review October 10, 2023 16:21

# add ops to pipeline
pipeline.add_op(load_from_hub_op)
# pipeline.add_op(download_images_op, dependencies=load_from_hub_op)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you uncomment here?

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Robbe, left one comment regarding arguments but I think we can discuss it separately

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's merge!

@RobbeSneyders RobbeSneyders merged commit 6b46324 into main Oct 11, 2023
4 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/kfp-v2 branch October 11, 2023 09:50
@RobbeSneyders RobbeSneyders changed the title KfpV2 and Vertex Migrate to KfpV2 Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants