-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sdk): fix output type of importer #8172
Conversation
Skipping CI for Draft Pull Request. |
/test all |
…r determining type_name
/retest |
/retest |
/retest |
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @chongyouquan! I have just a few questions.
sdk/python/kfp/compiler/test_data/pipelines/pipeline_with_importer_and_custom_artifact_type.py
Outdated
Show resolved
Hide resolved
TYPE_NAME = 'custom.MyDataset' | ||
|
||
|
||
@component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the bug you mentioned was uncovered using the @dsl.container_component
decorator. Is that correct?
Currently, we do not support custom artifacts in the lightweight Python component. This pipeline will not run currently since the MyDataset
symbol definition is unavailable at runtime.
WDYT? Do you think we can switch this example to a @dsl.container_component
function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug that involves the importer
in KFP v2, separate from use of a container component.
For instance, the following pipeline should work:
from kfp import dsl
from google_cloud_pipeline_components import aiplatform as gcpc_aip
from google_cloud_pipeline_components.types import artifact_types
@dsl.pipeline(name=_PIPELINE_NAME)
def pipeline():
importer_spec = importer(
artifact_uri=f'gs://managed-pipeline-gcpc-e2e-test/automl-tabular/model',
artifact_class=artifact_types.UnmanagedContainerModel,
metadata={
'containerSpec': {
'imageUri':
'us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:prod'
}
})
model_upload_with_artifact_op = gcpc_aip.ModelUploadOp(
project=_GCP_PROJECT_ID,
location=_GCP_REGION,
display_name=_MODEL_DISPLAY_NAME,
unmanaged_container_model=importer_spec.outputs['artifact'])
However, when compiling the above pipeline in KFP v2, the following error is encountered:
type name UnmanagedContainerModel is different from expected: google.UnmanagedContainerModel
.
I believe this type mismatch occurs due to the logic of the output of the importer
not matching the logic in component_factory._annotation_to_type_struct
, hence the need for this PR.
I think for the purposes of making sure compilation passes for using a custom artifact type with the importer and an ensuing consumer component, this test would suffice.
While custom artifacts are currently unsupported in the lightweight Python component, the above example can compile and run in Managed Pipelines using KFP v1, as we're using v1 component YAML (instead of lightweight Python components). This PR is simply to make sure that the above example also runs in KFP v2 (as it should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying. I do see what you're saying about v1 component YAML and the purpose of the change.
I think that the compilation test is slightly misleading because it isn't executable (while the example in your comment is), but perhaps that's okay since it's not a sample test.
Do you think you could add a comment to the file explaining that the purpose is to test type compatibility at component interfaces, not to show usage of a lightweight component? And that the lightweight component would fail at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see your point about the pipeline being compilable but not actually executable.
I've changed the test to use a dummy YAML component to consume the output from importer
(so that it's a pipeline meant to be both compilable and executable), and changed the name of the test to include "gcpc" (this fix works for all custom artifact types that uses TYPE_NAME
, but perhaps it's better not to overstate the scope here).
) and not artifact_class.TYPE_NAME.startswith('system.'): | ||
# For artifact classes not under the `system` namespace, | ||
# use its TYPE_NAME as-is. | ||
type_name = artifact_class.TYPE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I have a proposal to convert the .TYPE_NAME
field to .schema_title
in KFP v2 to align with Vertex artifact attributes. GCPC is the only library that should have a dependency on this.
I think we can proceed with this change for now, but want to confirm that this will not pose a problem for GCPC. Can we align GCPC v2 to this new attribute name once KFP v2 makes this change (assuming proposal is approved)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original PR that @chensun created to resolve a bug in use of GCPC artifact types: #6702.
This current PR is meant to resolve the issue for use of the importer with GCPC components in KFP v2. The new pipeline tests in this PR confirms that the importer works for the case of a custom artifact type with a TYPE_NAME
attribute (GCPC or otherwise).
Will defer to @chensun on the above-mentioned proposal.
# use its TYPE_NAME as-is. | ||
type_name = artifact_class.TYPE_NAME | ||
else: | ||
type_name = artifact_class.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the value of using/permitting artifact_class.__name__
is. It seems like we'd want to allow (and even enforce) artifact class authors to set the schema_title using an attribute rather than the class name, especially for non-system artifacts.
This is thinking ahead a little bit, but I think we want to soon move away from checking if a class is an artifact class by checking if it is a subclass of dsl.Artifact
and instead just checking that it adheres to the KFP artifact class protocol (which includes having a .TYPE_NAME
attribute), that way third-party libraries containing KFP artifacts do not need to depend on KFP. This would require that artifact classes have a TYPE_NAME
attribute.
cc: @chensun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my own question after more investigation.
Use of artifact_class.__name__
effectively strips the 'system.'
for KFP-native artifacts (since the class name should be equal to .TYPE_NAME
after 'system.'
is stripped. This is likely done this way for compatibility with KFP v1: an artifact is encoded in v1 component YAML as 'Artifact'
, not 'system.Artifact'
.
My preference is to preserve the full type name (with 'system.'
) to reduce any ambiguity in naming and enable backward-compatibility with v1 through logic in v2, not through the compressed artifact name representation.
/retest |
/retest |
2 similar comments
/retest |
/retest |
/lgtm Thanks for this, @chongyouquan! I want to highlight this comment for myself and other future reviewers of this PR (#8172 (comment)). I think we'll change this implementation even more very shortly to add comprehensive support custom artifact types. Specifically, removing the following and making changes to other parts of the load/compile logic for consistency: else:
type_name = artifact_class.__name__ |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: connor-mccarthy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* change output type of importer to artifact_class.TYPE_NAME * use the same logic as component_factory._annotation_to_type_struct for determining type_name * format using yapf * fix type_name * format using yapf * add yaml file * changes with regards to PR comments * format pipeline_with_importer_and_gcpc_types.py * remove unused imports * change name of test component * add pipeline_with_importer_and_gcpc_types to config * update yaml file
Description of your changes:
Checklist: