-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): use custom basemodel and remove pydantic #7639
feat(sdk): use custom basemodel and remove pydantic #7639
Conversation
Skipping CI for Draft Pull Request. |
/test all |
5680c6c
to
644fb4b
Compare
/test all |
644fb4b
to
2894a3f
Compare
/test all |
2894a3f
to
b74c983
Compare
/test all |
/test all |
b74c983
to
b4669dc
Compare
/hold This PR is ready for review as a standalone unit of work, but there are two things I would like to do before merging:
|
BaseModelType = TypeVar('BaseModelType', bound='BaseModel') | ||
|
||
|
||
class BaseModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but my sense is we may not need a BaseModel
class at all. Because IR is defined via protobuf, we got the serialization/deserialization functionality via the protobuf library already, any other serialization/deserialization logic seems to me redundant and reimplementing the wheel of protobuf library? Another way to think is what would be the use case for from_json
and to_json
methods in this base model?
We need some logic to map between our existing internal structures and protobuf generated structures, I suspect the conversion logic would be customized per each individual class, thus a base class may not add much value to the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A base class could be useful if our classes and the protobuf generated classes have exact same fields, and the base class could do some setattr
/getattr
via field names. That being said, I'm not sure if this is worth it, the downside could be less readable, error-prone, and hard for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @chensun. I definitely agree with the sentiment, though a few sources of complexity arose during my development that lead me to take this approach. I'm curious what you think (1 and 2 are cases for using a BaseModel
, 3 and 4 are other relevant notes):
- As you mention, we need somewhere to hold our data structures in memory before converting to a
BaseModel
. In the simplest form, I think this would probably be an@dataclasses.dataclass
. ThisBaseModel
accomplishes this: it turns subclasses intodataclass
es, but does so via inheritance (as opposed to decoration), so we can add extra functionality as needed. This functionality currently includes:
- support for custom
validate_*
methods for more helpful use error messages andtransform_*
methods for data conversion functionality not supported by pure dataclasses - assertion that the types provided for the dataclass fields are supported by our downstream logic
- To answer your question:
what would be the use case for
from_json
andto_json
methods in this base model?
The main use case (at the end state of all of the V2 YAML component packaging format changes) is enabling support for V1 and V2 readable YAML (read) backward compatibility. This requires at least from_json
, I think. We want to be able to read in old YAML to the internal structures. The custom implementation of from_json
(and under the hood, from_dict
) handles our specific type casting logic, as well as built-in support for aliasing, via the by_alias
parameter.
-
I began down the route of jointly removing
pydantic
and manipulating our internal-structure-to-proto conversion logic and found this very error prone and challenging to debug. This is becausepydantic
is all or nothing -- there's no way to remove/replace functionality piecewise (structure by structure or method by method) to verify tests are passing throughout the (large) development process. Migrating to aBaseClass
that we own/control as a discrete step before implementing the IR conversion process alleviates this, reducing the complexity and increasing the observability of theto_proto
andfrom_proto
implementations that come next. -
Regarding your comment:
I suspect the conversion logic would be customized per each individual class
I completely agree with this and, if it weren't for the other points above, I would also support not using a BaseModel
class on this basis.
Also, as a minor point: I'm not yet sure about this, but I think it might also be possible that we'd want the BaseModel
to own the to_proto
logic, then the subclasses to own their own encoding handlers (akin to the cls
argument in json.dump. This may not be the direction we ultimately go, but just a thought.
--
I suspect there are several things I'm not thinking about here, so curious to hear what you think and talk through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thank you, Connor!
BaseModelType = TypeVar('BaseModelType', bound='BaseModel') | ||
|
||
|
||
class BaseModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation, Connor!
You might be thinking on a more fancy implementation than I did. :) Let's keep your code and see how it may help.
/unhold |
Thanks, @chensun! I previously mentioned it might make sense to wait until after the 2.0.0-alpha.2 release to merge. I think it may actually make sense to merge now to avoid merge conflicts from future changes. Any objection? |
Talked to, @chensun -- no problem with merging. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: connor-mccarthy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* fix discovered bug * update tests * implement custom BaseModel * use basemodel for structures * remove pydantic dependency * assorted code cleanup
* fix discovered bug * update tests * implement custom BaseModel * use basemodel for structures * remove pydantic dependency * assorted code cleanup
Description of your changes:
Implements a custom
BaseModel
on top of Python dataclasses, allowing us to removepydantic
as a dependency.This is the first step toward enabling compilation of components as a pipeline (IR).
Checklist:
more about the pull request title convention used in this repository.