feat(sdk): use custom basemodel and remove pydantic #7639

connor-mccarthy · 2022-04-28T21:16:48Z

Description of your changes:
Implements a custom BaseModel on top of Python dataclasses, allowing us to remove
pydantic as a dependency.

This is the first step toward enabling compilation of components as a pipeline (IR).

Checklist:

The title for your pull request (PR) should follow our title convention. Learn
more about the pull request title convention used in this repository.

google-oss-prow · 2022-04-28T21:16:50Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

connor-mccarthy · 2022-04-28T21:16:55Z

/test all

connor-mccarthy · 2022-04-29T05:29:15Z

/test all

connor-mccarthy · 2022-04-29T05:32:19Z

/test all

connor-mccarthy · 2022-04-29T05:36:16Z

/test all

connor-mccarthy · 2022-04-29T05:41:58Z

/test all

connor-mccarthy · 2022-04-29T14:03:31Z

/hold This PR is ready for review as a standalone unit of work, but there are two things I would like to do before merging:

I would like to merge a branch that enables compilation of components to IR into this one.
I would like to wait to merge until after the v2 alpha.2 release (targeted: 5/2/22).

chensun · 2022-05-02T17:35:29Z

sdk/python/kfp/components/base_model.py

+BaseModelType = TypeVar('BaseModelType', bound='BaseModel')
+
+
+class BaseModel:


I could be wrong, but my sense is we may not need a BaseModel class at all. Because IR is defined via protobuf, we got the serialization/deserialization functionality via the protobuf library already, any other serialization/deserialization logic seems to me redundant and reimplementing the wheel of protobuf library? Another way to think is what would be the use case for from_json and to_json methods in this base model?

We need some logic to map between our existing internal structures and protobuf generated structures, I suspect the conversion logic would be customized per each individual class, thus a base class may not add much value to the conversion.

A base class could be useful if our classes and the protobuf generated classes have exact same fields, and the base class could do some setattr/getattr via field names. That being said, I'm not sure if this is worth it, the downside could be less readable, error-prone, and hard for debugging.

Thanks, @chensun. I definitely agree with the sentiment, though a few sources of complexity arose during my development that lead me to take this approach. I'm curious what you think (1 and 2 are cases for using a BaseModel, 3 and 4 are other relevant notes):

As you mention, we need somewhere to hold our data structures in memory before converting to a BaseModel. In the simplest form, I think this would probably be an @dataclasses.dataclass. This BaseModel accomplishes this: it turns subclasses into dataclasses, but does so via inheritance (as opposed to decoration), so we can add extra functionality as needed. This functionality currently includes:

support for custom validate_* methods for more helpful use error messages and transform_* methods for data conversion functionality not supported by pure dataclasses

assertion that the types provided for the dataclass fields are supported by our downstream logic

To answer your question:

what would be the use case for from_json and to_json methods in this base model?

The main use case (at the end state of all of the V2 YAML component packaging format changes) is enabling support for V1 and V2 readable YAML (read) backward compatibility. This requires at least from_json, I think. We want to be able to read in old YAML to the internal structures. The custom implementation of from_json (and under the hood, from_dict) handles our specific type casting logic, as well as built-in support for aliasing, via the by_alias parameter.

I began down the route of jointly removing pydantic and manipulating our internal-structure-to-proto conversion logic and found this very error prone and challenging to debug. This is because pydantic is all or nothing -- there's no way to remove/replace functionality piecewise (structure by structure or method by method) to verify tests are passing throughout the (large) development process. Migrating to a BaseClass that we own/control as a discrete step before implementing the IR conversion process alleviates this, reducing the complexity and increasing the observability of the to_proto and from_proto implementations that come next.

Regarding your comment:

I suspect the conversion logic would be customized per each individual class

I completely agree with this and, if it weren't for the other points above, I would also support not using a BaseModel class on this basis.

Also, as a minor point: I'm not yet sure about this, but I think it might also be possible that we'd want the BaseModel to own the to_proto logic, then the subclasses to own their own encoding handlers (akin to the cls argument in json.dump. This may not be the direction we ultimately go, but just a thought.

--
I suspect there are several things I'm not thinking about here, so curious to hear what you think and talk through it.

Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.

chensun

/lgtm

Thank you, Connor!

chensun · 2022-05-03T23:54:33Z

sdk/python/kfp/components/base_model.py

+BaseModelType = TypeVar('BaseModelType', bound='BaseModel')
+
+
+class BaseModel:


Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.

connor-mccarthy · 2022-05-04T01:00:42Z

/unhold

connor-mccarthy · 2022-05-04T01:05:36Z

Thanks, @chensun! I previously mentioned it might make sense to wait until after the 2.0.0-alpha.2 release to merge. I think it may actually make sense to merge now to avoid merge conflicts from future changes. Any objection?

connor-mccarthy · 2022-05-04T18:16:20Z

Talked to, @chensun -- no problem with merging.

/approve

google-oss-prow · 2022-05-04T18:16:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: connor-mccarthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [connor-mccarthy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* fix discovered bug * update tests * implement custom BaseModel * use basemodel for structures * remove pydantic dependency * assorted code cleanup

connor-mccarthy requested review from chensun and ji-yaqi April 28, 2022 21:16

google-oss-prow bot added the do-not-merge/work-in-progress label Apr 28, 2022

google-oss-prow bot added the size/XXL label Apr 28, 2022

connor-mccarthy added 2 commits April 28, 2022 23:16

fix discovered bug

b725514

update tests

23bbb0d

connor-mccarthy force-pushed the implement-custom-basemodel branch from 5680c6c to 644fb4b Compare April 29, 2022 05:29

connor-mccarthy force-pushed the implement-custom-basemodel branch from 644fb4b to 2894a3f Compare April 29, 2022 05:32

connor-mccarthy force-pushed the implement-custom-basemodel branch from 2894a3f to b74c983 Compare April 29, 2022 05:36

connor-mccarthy added 4 commits April 28, 2022 23:41

implement custom BaseModel

6497ed7

use basemodel for structures

b809a21

remove pydantic dependency

a323f34

assorted code cleanup

b4669dc

connor-mccarthy force-pushed the implement-custom-basemodel branch from b74c983 to b4669dc Compare April 29, 2022 05:41

connor-mccarthy marked this pull request as ready for review April 29, 2022 13:57

google-oss-prow bot removed the do-not-merge/work-in-progress label Apr 29, 2022

google-oss-prow bot added the do-not-merge/hold label Apr 29, 2022

connor-mccarthy mentioned this pull request Apr 29, 2022

feat(sdk): enable compilation of primitive components #7580

Merged

1 task

chensun reviewed May 2, 2022

View reviewed changes

chensun reviewed May 4, 2022

View reviewed changes

google-oss-prow bot assigned chensun May 4, 2022

google-oss-prow bot added the lgtm label May 4, 2022

google-oss-prow bot removed the do-not-merge/hold label May 4, 2022

google-oss-prow bot added the approved label May 4, 2022

connor-mccarthy merged commit 5da3826 into kubeflow:master May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): use custom basemodel and remove pydantic #7639

feat(sdk): use custom basemodel and remove pydantic #7639

connor-mccarthy commented Apr 28, 2022

google-oss-prow bot commented Apr 28, 2022

connor-mccarthy commented Apr 28, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

chensun May 2, 2022

chensun May 2, 2022

connor-mccarthy May 2, 2022

chensun May 3, 2022

chensun left a comment

chensun May 3, 2022

connor-mccarthy commented May 4, 2022

connor-mccarthy commented May 4, 2022

connor-mccarthy commented May 4, 2022

google-oss-prow bot commented May 4, 2022

		BaseModelType = TypeVar('BaseModelType', bound='BaseModel')


		class BaseModel:

feat(sdk): use custom basemodel and remove pydantic #7639

feat(sdk): use custom basemodel and remove pydantic #7639

Conversation

connor-mccarthy commented Apr 28, 2022

google-oss-prow bot commented Apr 28, 2022

connor-mccarthy commented Apr 28, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

connor-mccarthy commented Apr 29, 2022

chensun May 2, 2022

Choose a reason for hiding this comment

chensun May 2, 2022

Choose a reason for hiding this comment

connor-mccarthy May 2, 2022

Choose a reason for hiding this comment

chensun May 3, 2022

Choose a reason for hiding this comment

chensun left a comment

Choose a reason for hiding this comment

chensun May 3, 2022

Choose a reason for hiding this comment

connor-mccarthy commented May 4, 2022

connor-mccarthy commented May 4, 2022

connor-mccarthy commented May 4, 2022

google-oss-prow bot commented May 4, 2022