Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kfp compiler #291

Merged
merged 8 commits into from
Aug 10, 2023
Merged

Add kfp compiler #291

merged 8 commits into from
Aug 10, 2023

Conversation

GeorgesLorre
Copy link
Collaborator

@GeorgesLorre GeorgesLorre commented Jul 12, 2023

Untested -> looking for feedback on the structure not the content

pipeline_versions = self.client.list_pipeline_versions(pipeline_id).versions
return [version.id for version in pipeline_versions]

def delete_pipeline(self, pipeline_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see all the other functionalities that are not related to compiling and running? user might still want to list pipeline runs, delete pipelines, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to wrap this functionality for every platform we want to run on. At least at this point in time, I would be ok with removing it and having the user work with the underlying platform directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, we'll re-add it later if we notice that it's needed


def compile(
self,
pipeline: Pipeline,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Pipeline is the unifying interface between all runners right? if so, would add it to the abstract class

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall structure and separation between the Compiler and Runner!

I think there are still some things that don't fall quite between like Client related functionalities. Not the highest priority but we should figure out where those land

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre!

The lines along you split the behavior look good to me!

src/fondant/compiler.py Show resolved Hide resolved
except ImportError:
raise ImportError(
"You need to install kfp to use the Kubeflow compiler, "
/ "you can install it with `poetry install --extras kfp`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/ "you can install it with `poetry install --extras kfp`"
/ "you can install it with `pip install fondant[kfp]`"

poetry install is only for fondant developers. Users can also use poetry, but then they'll install their own project and need to indicate the fondant extras with the fondant dependency in their pyproject.toml.

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 42 to 43
"You need to install kfp to use the Kubeflow compiler, "
/ "you can install it with `poetry install --extras kfp`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"You need to install kfp to use the Kubeflow compiler, "
/ "you can install it with `poetry install --extras kfp`"
"You need to install kfp to use the Kubeflow runner, "
/ "you can install it with `pip install fondant[kfp]`"

pipeline_versions = self.client.list_pipeline_versions(pipeline_id).versions
return [version.id for version in pipeline_versions]

def delete_pipeline(self, pipeline_name: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to wrap this functionality for every platform we want to run on. At least at this point in time, I would be ok with removing it and having the user work with the underlying platform directly.

def run(cls, input_spec: str, host: str, *args, **kwargs):
"""Run a kubeflow pipeline."""
cls._resolve_imports()
client = kfp.Client(host=host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably doesn't change a lot to the current functionality, but it would make more sense to me to instantiate this in __init__().

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre!

The general direction looks good. Left some comments.

self.kfp = kfp
except ImportError:
msg = """You need to install kfp to use the Kubeflow compiler,\n
you can install it with `pip install --extras kfp`"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
you can install it with `pip install --extras kfp`"""
you can install it with `pip install fondant[pipelines]`"""

Although we probably want to split this up for the different runners so it becomes fondant[kfp].

Comment on lines 243 to 244
self.pipeline.sort_graph()
self.pipeline._validate_pipeline_definition("{{workflow.name}}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in the DockerCompiler as well, no?

@@ -18,8 +17,6 @@
from fondant.schema import validate_partition_number, validate_partition_size

if is_kfp_available():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we can get rid of this and make this file completely runner independent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for setting p_volumes at the componentOp level, that was only needed when we were initially planning to download the Laion dataset locally to use it. I think we can remove both ephemeral_storage_size and p_volumes from the ComponenOp. We can always re-add them later if we notice they're needed

def _resolve_imports(self):
"""Resolve imports for the Kubeflow compiler."""
try:
global kfp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assign to self.kfp here as in the compiler?

except ImportError:
raise ImportError(
"You need to install kfp to use the Kubeflow compiler, "
/ "you can install it with `pip install --extras kfp`",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on same line above.

Comment on lines 53 to 62
# client = kfp.Client(host=host)
# # TODO add logic to see if pipeline exists
# pipeline_spec = client.run_pipeline(
# experiment_id=experiment.id,
# job_name=run_name,
# pipeline_package_path=pipeline.package_path,
# )

# pipeline_url = f"{self.host}/#/runs/details/{pipeline_spec.id}"
# logger.info(f"Pipeline is running at: {pipeline_url}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is still commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not have committed this, will be taken up in a separate PR addressing the kfp runner

"""
Test that an exception is raised when the passed invalid argument name or type to the fondant
component does not match the ones specified in the fondant specifications.
Test that an InvalidPipelineDefinition exception is raised when attempting to compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're no longer compiling in the test :)

],
)
def test_invalid_pipeline_compilation(
default_pipeline_args,
invalid_pipeline_example,
tmp_path,
):
"""
Test that an InvalidPipelineDefinition exception is raised when attempting to compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're no longer compiling in the test :)

@@ -169,12 +165,12 @@ def test_invalid_pipeline_dependencies(
[
("example_1", ["first_component", "second_component"]),
("example_2", ["first_component", "second_component"]),
("example_3", ["first_component", "second_component"]),
],
)
def test_invalid_pipeline_compilation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_invalid_pipeline_compilation(
def test_invalid_pipeline_validation(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be removed

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Georges! Looks much cleaner already ✨

self.kfp.compiler.Compiler().compile(wrapped_pipeline, output_path) # type: ignore
logger.info("Pipeline compiled successfully")

def kf_pipeline(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def kf_pipeline(self):
def kfp_pipeline(self):

metadata=metadata,
**component_args,
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else statement needed, the two statements are the same, I think now they can be the same since we're passing the metadata to every components

src/fondant/compiler.py Show resolved Hide resolved
@@ -18,8 +17,6 @@
from fondant.schema import validate_partition_number, validate_partition_size

if is_kfp_available():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for setting p_volumes at the componentOp level, that was only needed when we were initially planning to download the Laion dataset locally to use it. I think we can remove both ephemeral_storage_size and p_volumes from the ComponenOp. We can always re-add them later if we notice they're needed

@GeorgesLorre GeorgesLorre changed the title Add draft structure for kfp refactor Add kfp compiler Aug 8, 2023
@GeorgesLorre
Copy link
Collaborator Author

@RobbeSneyders updated!

@GeorgesLorre
Copy link
Collaborator Author

GeorgesLorre commented Aug 8, 2023

@PhilippeMoussalli Ok I'll remove the ephemeral_storage_size and p_volumes in a new PR. This would indeed be cleaner. #342

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you just update the error message for the kfp import before merging?

@GeorgesLorre GeorgesLorre merged commit 262fd88 into main Aug 10, 2023
@GeorgesLorre GeorgesLorre deleted the feature/kf-compiler branch August 10, 2023 08:59
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Move all kfp related stuff from the Pipeline to a separate kfp compiler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants