Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): use custom basemodel and remove pydantic #7639

Merged

Conversation

connor-mccarthy
Copy link
Member

Description of your changes:
Implements a custom BaseModel on top of Python dataclasses, allowing us to remove
pydantic as a dependency.

This is the first step toward enabling compilation of components as a pipeline (IR).

Checklist:

@google-oss-prow
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@connor-mccarthy
Copy link
Member Author

/test all

@connor-mccarthy connor-mccarthy force-pushed the implement-custom-basemodel branch from 5680c6c to 644fb4b Compare April 29, 2022 05:29
@connor-mccarthy
Copy link
Member Author

/test all

@connor-mccarthy connor-mccarthy force-pushed the implement-custom-basemodel branch from 644fb4b to 2894a3f Compare April 29, 2022 05:32
@connor-mccarthy
Copy link
Member Author

/test all

@connor-mccarthy connor-mccarthy force-pushed the implement-custom-basemodel branch from 2894a3f to b74c983 Compare April 29, 2022 05:36
@connor-mccarthy
Copy link
Member Author

/test all

@connor-mccarthy
Copy link
Member Author

/test all

@connor-mccarthy connor-mccarthy force-pushed the implement-custom-basemodel branch from b74c983 to b4669dc Compare April 29, 2022 05:41
@connor-mccarthy connor-mccarthy marked this pull request as ready for review April 29, 2022 13:57
@connor-mccarthy
Copy link
Member Author

/hold This PR is ready for review as a standalone unit of work, but there are two things I would like to do before merging:

  1. I would like to merge a branch that enables compilation of components to IR into this one.
  2. I would like to wait to merge until after the v2 alpha.2 release (targeted: 5/2/22).

BaseModelType = TypeVar('BaseModelType', bound='BaseModel')


class BaseModel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but my sense is we may not need a BaseModel class at all. Because IR is defined via protobuf, we got the serialization/deserialization functionality via the protobuf library already, any other serialization/deserialization logic seems to me redundant and reimplementing the wheel of protobuf library? Another way to think is what would be the use case for from_json and to_json methods in this base model?

We need some logic to map between our existing internal structures and protobuf generated structures, I suspect the conversion logic would be customized per each individual class, thus a base class may not add much value to the conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A base class could be useful if our classes and the protobuf generated classes have exact same fields, and the base class could do some setattr/getattr via field names. That being said, I'm not sure if this is worth it, the downside could be less readable, error-prone, and hard for debugging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @chensun. I definitely agree with the sentiment, though a few sources of complexity arose during my development that lead me to take this approach. I'm curious what you think (1 and 2 are cases for using a BaseModel, 3 and 4 are other relevant notes):

  1. As you mention, we need somewhere to hold our data structures in memory before converting to a BaseModel. In the simplest form, I think this would probably be an @dataclasses.dataclass. This BaseModel accomplishes this: it turns subclasses into dataclasses, but does so via inheritance (as opposed to decoration), so we can add extra functionality as needed. This functionality currently includes:
  • support for custom validate_* methods for more helpful use error messages and transform_* methods for data conversion functionality not supported by pure dataclasses
  • assertion that the types provided for the dataclass fields are supported by our downstream logic
  1. To answer your question:

what would be the use case for from_json and to_json methods in this base model?

The main use case (at the end state of all of the V2 YAML component packaging format changes) is enabling support for V1 and V2 readable YAML (read) backward compatibility. This requires at least from_json, I think. We want to be able to read in old YAML to the internal structures. The custom implementation of from_json (and under the hood, from_dict) handles our specific type casting logic, as well as built-in support for aliasing, via the by_alias parameter.

  1. I began down the route of jointly removing pydantic and manipulating our internal-structure-to-proto conversion logic and found this very error prone and challenging to debug. This is because pydantic is all or nothing -- there's no way to remove/replace functionality piecewise (structure by structure or method by method) to verify tests are passing throughout the (large) development process. Migrating to a BaseClass that we own/control as a discrete step before implementing the IR conversion process alleviates this, reducing the complexity and increasing the observability of the to_proto and from_proto implementations that come next.

  2. Regarding your comment:

I suspect the conversion logic would be customized per each individual class

I completely agree with this and, if it weren't for the other points above, I would also support not using a BaseModel class on this basis.

Also, as a minor point: I'm not yet sure about this, but I think it might also be possible that we'd want the BaseModel to own the to_proto logic, then the subclasses to own their own encoding handlers (akin to the cls argument in json.dump. This may not be the direction we ultimately go, but just a thought.

--
I suspect there are several things I'm not thinking about here, so curious to hear what you think and talk through it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you, Connor!

BaseModelType = TypeVar('BaseModelType', bound='BaseModel')


class BaseModel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.

@connor-mccarthy
Copy link
Member Author

/unhold

@connor-mccarthy
Copy link
Member Author

Thanks, @chensun! I previously mentioned it might make sense to wait until after the 2.0.0-alpha.2 release to merge. I think it may actually make sense to merge now to avoid merge conflicts from future changes. Any objection?

@connor-mccarthy
Copy link
Member Author

Talked to, @chensun -- no problem with merging.

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: connor-mccarthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@connor-mccarthy connor-mccarthy merged commit 5da3826 into kubeflow:master May 4, 2022
jagadeeshi2i pushed a commit to jagadeeshi2i/pipelines that referenced this pull request May 11, 2022
* fix discovered bug

* update tests

* implement custom BaseModel

* use basemodel for structures

* remove pydantic dependency

* assorted code cleanup
abaland pushed a commit to abaland/pipelines that referenced this pull request May 29, 2022
* fix discovered bug

* update tests

* implement custom BaseModel

* use basemodel for structures

* remove pydantic dependency

* assorted code cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants