Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk): fix output type of importer #8172

Merged
merged 12 commits into from
Aug 24, 2022

Conversation

chongyouquan
Copy link
Contributor

@chongyouquan chongyouquan commented Aug 19, 2022

Description of your changes:

  • use the same logic as component_factory._annotation_to_type_struct for determining type_name

Checklist:

@google-oss-prow
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@chongyouquan
Copy link
Contributor Author

/test all

@chongyouquan chongyouquan changed the title fix(sdk): change output type of importer to artifact_class.TYPE_NAME fix(sdk): fix output type of importer Aug 19, 2022
@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan chongyouquan marked this pull request as ready for review August 20, 2022 00:02
@chongyouquan chongyouquan marked this pull request as draft August 20, 2022 00:05
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Aug 20, 2022
@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan chongyouquan marked this pull request as ready for review August 20, 2022 00:29
@connor-mccarthy
Copy link
Member

/retest

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @chongyouquan! I have just a few questions.

TYPE_NAME = 'custom.MyDataset'


@component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the bug you mentioned was uncovered using the @dsl.container_component decorator. Is that correct?

Currently, we do not support custom artifacts in the lightweight Python component. This pipeline will not run currently since the MyDataset symbol definition is unavailable at runtime.

WDYT? Do you think we can switch this example to a @dsl.container_component function instead?

Copy link
Contributor Author

@chongyouquan chongyouquan Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug that involves the importer in KFP v2, separate from use of a container component.

For instance, the following pipeline should work:

from kfp import dsl
from google_cloud_pipeline_components import aiplatform as gcpc_aip
from google_cloud_pipeline_components.types import artifact_types

@dsl.pipeline(name=_PIPELINE_NAME)
def pipeline():
  importer_spec = importer(
      artifact_uri=f'gs://managed-pipeline-gcpc-e2e-test/automl-tabular/model',
      artifact_class=artifact_types.UnmanagedContainerModel,
      metadata={
          'containerSpec': {
              'imageUri':
                  'us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:prod'
          }
      })

  model_upload_with_artifact_op = gcpc_aip.ModelUploadOp(
      project=_GCP_PROJECT_ID,
      location=_GCP_REGION,
      display_name=_MODEL_DISPLAY_NAME,
      unmanaged_container_model=importer_spec.outputs['artifact'])

However, when compiling the above pipeline in KFP v2, the following error is encountered:
type name UnmanagedContainerModel is different from expected: google.UnmanagedContainerModel.

I believe this type mismatch occurs due to the logic of the output of the importer not matching the logic in component_factory._annotation_to_type_struct, hence the need for this PR.

I think for the purposes of making sure compilation passes for using a custom artifact type with the importer and an ensuing consumer component, this test would suffice.

While custom artifacts are currently unsupported in the lightweight Python component, the above example can compile and run in Managed Pipelines using KFP v1, as we're using v1 component YAML (instead of lightweight Python components). This PR is simply to make sure that the above example also runs in KFP v2 (as it should).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying. I do see what you're saying about v1 component YAML and the purpose of the change.

I think that the compilation test is slightly misleading because it isn't executable (while the example in your comment is), but perhaps that's okay since it's not a sample test.

Do you think you could add a comment to the file explaining that the purpose is to test type compatibility at component interfaces, not to show usage of a lightweight component? And that the lightweight component would fail at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see your point about the pipeline being compilable but not actually executable.

I've changed the test to use a dummy YAML component to consume the output from importer (so that it's a pipeline meant to be both compilable and executable), and changed the name of the test to include "gcpc" (this fix works for all custom artifact types that uses TYPE_NAME, but perhaps it's better not to overstate the scope here).

) and not artifact_class.TYPE_NAME.startswith('system.'):
# For artifact classes not under the `system` namespace,
# use its TYPE_NAME as-is.
type_name = artifact_class.TYPE_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I have a proposal to convert the .TYPE_NAME field to .schema_title in KFP v2 to align with Vertex artifact attributes. GCPC is the only library that should have a dependency on this.

I think we can proceed with this change for now, but want to confirm that this will not pose a problem for GCPC. Can we align GCPC v2 to this new attribute name once KFP v2 makes this change (assuming proposal is approved)?

Copy link
Contributor Author

@chongyouquan chongyouquan Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original PR that @chensun created to resolve a bug in use of GCPC artifact types: #6702.

This current PR is meant to resolve the issue for use of the importer with GCPC components in KFP v2. The new pipeline tests in this PR confirms that the importer works for the case of a custom artifact type with a TYPE_NAME attribute (GCPC or otherwise).

Will defer to @chensun on the above-mentioned proposal.

# use its TYPE_NAME as-is.
type_name = artifact_class.TYPE_NAME
else:
type_name = artifact_class.__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the value of using/permitting artifact_class.__name__ is. It seems like we'd want to allow (and even enforce) artifact class authors to set the schema_title using an attribute rather than the class name, especially for non-system artifacts.

This is thinking ahead a little bit, but I think we want to soon move away from checking if a class is an artifact class by checking if it is a subclass of dsl.Artifact and instead just checking that it adheres to the KFP artifact class protocol (which includes having a .TYPE_NAME attribute), that way third-party libraries containing KFP artifacts do not need to depend on KFP. This would require that artifact classes have a TYPE_NAME attribute.

cc: @chensun

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my own question after more investigation.

Use of artifact_class.__name__ effectively strips the 'system.' for KFP-native artifacts (since the class name should be equal to .TYPE_NAME after 'system.' is stripped. This is likely done this way for compatibility with KFP v1: an artifact is encoded in v1 component YAML as 'Artifact', not 'system.Artifact'.

My preference is to preserve the full type name (with 'system.') to reduce any ambiguity in naming and enable backward-compatibility with v1 through logic in v2, not through the compressed artifact name representation.

@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan
Copy link
Contributor Author

/retest

2 similar comments
@chongyouquan
Copy link
Contributor Author

/retest

@chongyouquan
Copy link
Contributor Author

/retest

@connor-mccarthy
Copy link
Member

/lgtm
/approve

Thanks for this, @chongyouquan!

I want to highlight this comment for myself and other future reviewers of this PR (#8172 (comment)). I think we'll change this implementation even more very shortly to add comprehensive support custom artifact types. Specifically, removing the following and making changes to other parts of the load/compile logic for consistency:

    else:
        type_name = artifact_class.__name__

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: connor-mccarthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 5ccf53a into kubeflow:master Aug 24, 2022
jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this pull request Jan 5, 2023
* change output type of importer to artifact_class.TYPE_NAME

* use the same logic as component_factory._annotation_to_type_struct for determining type_name

* format using yapf

* fix type_name

* format using yapf

* add yaml file

* changes with regards to PR comments

* format pipeline_with_importer_and_gcpc_types.py

* remove unused imports

* change name of test component

* add pipeline_with_importer_and_gcpc_types to config

* update yaml file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants