Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Add comments to IR YAML file #8467

Merged
merged 20 commits into from
Dec 5, 2022

Conversation

JOCSTAA
Copy link
Contributor

@JOCSTAA JOCSTAA commented Nov 17, 2022

Description of your changes:
Implements a feature for the KFP SDK DSL compiler to include a summary of the pipeline details( including name, description and input/output signatures) as a comment to the IR YAML file

Checklist:

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 17, 2022

/retest

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @JOCSTAA!

One high level consideration: we'll want to also parse the description when we read load a component/pipeline from YAML, since this information is not preserved in PipelineSpec. This is so that the description isn't lost when a user authors a pipeline, compiles it, loads it, then compiles it again. We can discuss this offline.

description = pipeline_func.description or None

else:
description = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible I'm missing something, but it looks like this is to handle the fact that GraphComponents have a .description, but other BaseComponents (YamlComponent, PythonComponent) don't. Do you think we could put .description on the BaseComponent abstract base class and implement for all concrete classes, that way all component/pipeline types can be treated the same way by the compiler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also allow us to avoid expanding the interface of write_pipeline_spec_to_file to support comments.

sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
if pipeline_description:
comment += '# Description: ' + pipeline_description + '\n'
comment += add_inputs()
comment += add_outputs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider obtaining all the different sections in a list, then using a string joining method. I think this would slightly improve readability and reduce the likelihood that a \n is omitted on one of the strings somewhere.

'\n'.join(sections)

The same pattern might be helpful in add_inputs and add_outputs

}

def add_inputs():
if 'inputDefinitions' in pipeline_spec['root']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What do you think about separating the information collection from the information presentation in add_inputs and add_outputs for readability [ref]?

sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
sdk/python/kfp/compiler/compiler_test.py Show resolved Hide resolved
sdk/python/kfp/compiler/compiler_test.py Outdated Show resolved Hide resolved
@@ -1490,5 +1490,161 @@ def pipeline_with_input(boolean: bool = False):
.default_value.bool_value, True)


class TestYamlComments(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look good! A couple more test cases come to mind:

  • compiling a Python component @dsl.component
  • compiling a container component @dsl.container_component
  • compiling a pipeline or component with inputs that don't have default values
  • compiling a pipeline with an explicit name argument
  • compiling a pipeline or component with a float, int, dict, and list
  • compiling a pipeline or component with an artifact input and output

I don't think these each necessarily must be their own test (though that's fine too -- it's more specific but also more verbose).

Bundling these characteristics together and doing full comment assertions (comparing the actual full comment against an expected full comment) could make tackling these many cases a bit more manageable and similar to real-world components we'll be handling.

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 17, 2022

/retest

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 21, 2022

/retest

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 22, 2022

/retest

sdk/python/kfp/components/structures.py Outdated Show resolved Hide resolved
@@ -816,14 +816,28 @@ def load_from_component_yaml(cls, component_yaml: str) -> 'ComponentSpec':
Component spec in the form of V2 ComponentSpec.
"""

def extract_description(component_yaml: str) -> str:
heading = '# Description: '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant with extract_description_from_command [ref]? If so, can we remove one?

I think extracting from the comment (this function) is a better approach since it works for both components and pipelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion but I tested and confirmed that there are some cases where the description would not be in the pipeline spec yaml file but would be in the comments (for example the user defines the description using my_comp.description = 'description' and not as a functions docstring ) and vice versa (pipeline that was compiled previously without a comment) so i think both are necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks for explaining. The support for existing IR YAML is a great point. Am I correct that this only applies for compiled components? I believe any pipeline function docstrings would be lost during compilation.

I'm not sure I've ever seen description set via my_comp.description = 'description'. Do you have any example of this?


return comment_strings

if 'root' not in pipeline_spec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any instances where this condition would be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across a test case when running the python execution tests which had no 'root'. [ref]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the link is broken. Can you provide another?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile, which attempt to write an empty PipelineSpec.

Can you update these these? It's preferable to not program around our tests in the library code.

sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
sdk/python/kfp/components/graph_component.py Outdated Show resolved Hide resolved
def extract_description(component_yaml: str) -> str:
heading = '# Description: '
if heading in component_yaml:
description = component_yaml.splitlines()[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to be able to handle multiline docstrings

name_string = '# Name: my-pipeline'

# test name is in comments
self.assertTrue(name_string in yaml_content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What you have is perfectly valid, but perhaps use the approach of comparing the full comment string throughout (what you've done in the last test method). It's easier to reason about the correctness of the tests that compare the full comment string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel like having it this way for these first few tests gives more specificity as to what exactly we are searching for in the comments, but if you do have a strong opinion against it, i could always implement it that way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion. We can leave it as is if you prefer.

sdk/python/kfp/compiler/compiler_test.py Outdated Show resolved Hide resolved
sdk/python/kfp/compiler/compiler_test.py Show resolved Hide resolved
@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 23, 2022

/retest

@connor-mccarthy
Copy link
Member

@JOCSTAA, ignore kubeflow-pipelines-tfx-python37 test failure for now. I believe this is unrelated to your changes.

@chensun
Copy link
Member

chensun commented Nov 23, 2022

@JOCSTAA, ignore kubeflow-pipelines-tfx-python37 test failure for now. I believe this is unrelated to your changes.

Once GoogleCloudPlatform/oss-test-infra#1843 is merged, tfx test should no longer be required on master branch.

@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Nov 28, 2022
@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 28, 2022

retest/

1 similar comment
@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 28, 2022

retest/

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 29, 2022

retest/

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Nov 29, 2022

/test kubeflow-pipelines-sdk-python39

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for your work on this, @JOCSTAA. The comments are mostly nitpicks related to maintainability.

sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
sdk/python/kfp/compiler/pipeline_spec_builder.py Outdated Show resolved Hide resolved
sdk/python/kfp/components/component_factory.py Outdated Show resolved Hide resolved
sdk/python/kfp/components/structures.py Outdated Show resolved Hide resolved
# sample_input3: system.Model
# sample_input4: float [Default: 3.14]
# sample_input5: list [Default: [1.0, 2.0, 3.0]]
# sample_input6: dict [Default: {'one': 1.0, 'three': 3.0, 'two': 2.0}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: String quotes are " above for a default of type string but ' here in a dict

comment_sections.append('# PIPELINE DEFINITION')
comment_sections.append('# Name: ' + pipeline_spec['pipelineInfo']['name'])
if pipeline_description:
pipeline_description = '\n# '.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the length of this string is based on the size of the '# Description' string. Can you implement that programmatically to make the coupling explicit?

# test comments work on compiled container components
self.assertIn(predicted_comment, yaml_content)

def test_comments_indempotency(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indempotency -> idempotency

pipeline_spec_path = os.path.join(tmpdir, 'output.yaml')
compiler.Compiler().compile(
pipeline_func=my_pipeline, package_path=pipeline_spec_path)
comp = components.load_component_from_file(pipeline_spec_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can move this up to the original temporary directory to avoid having to compile again.

sdk/python/kfp/compiler/compiler_test.py Show resolved Hide resolved
@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Dec 1, 2022

/retest


return comment_strings

if 'root' not in pipeline_spec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile, which attempt to write an empty PipelineSpec.

Can you update these these? It's preferable to not program around our tests in the library code.

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Dec 5, 2022

/test kubeflow-pipeline-e2e-test

@JOCSTAA
Copy link
Contributor Author

JOCSTAA commented Dec 5, 2022

/retest

@connor-mccarthy
Copy link
Member

We can ignore kubeflow-pipelines-tfx-python37. It's not supposed to run on merges into master anymore: https://github.com/GoogleCloudPlatform/oss-test-infra/blob/8e2972aa3e72cdd221f03447b10f0d2358e88f9d/prow/prowjobs/kubeflow/pipelines/kubeflow-pipelines-presubmits.yaml#L130. Prow is trying to get it to pass since it once failed on this PR. It will not gate merge.

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your hard work on this, @JOCSTAA! This is a really great feature.

/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Dec 5, 2022
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: connor-mccarthy
Once this PR has been reviewed and has the lgtm label, please assign ironpan for approval by writing /assign @ironpan in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow
Copy link

@JOCSTAA: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-tfx-python37 16e1740 link true /test kubeflow-pipelines-tfx-python37
kubeflow-pipeline-e2e-test f4a7fa5 link true /test kubeflow-pipeline-e2e-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@connor-mccarthy
Copy link
Member

kubeflow-pipeline-e2e-test are unrelated to these changes. A fix is in progress.

@connor-mccarthy connor-mccarthy merged commit 49db63c into kubeflow:master Dec 5, 2022
jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this pull request Jan 5, 2023
* base

* add tests

* fix bug

* adress comments

* address comments 2

* sort comments

* sort signatures

* add indempotent test

* add indempotent test2

* support multiline docstring

* review

* docformatter presubmit exclude

* docformatter presubmit exclude

* docformatter

* docformatter

* merge 1

* update readme

* nit .items()

* remove reduntant test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants