-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop requiring users to import dataclasses_json
or DataClassJSONMixin
for dataclass
#2279
Conversation
flytekit/core/type_engine.py
Outdated
@@ -720,7 +701,7 @@ def _fix_val_int(self, t: typing.Type, val: typing.Any) -> typing.Any: | |||
|
|||
return val | |||
|
|||
def _fix_dataclass_int(self, dc_type: Type[DataClassJsonMixin], dc: DataClassJsonMixin) -> DataClassJsonMixin: | |||
def _fix_dataclass_int(self, dc_type: Type, dc: dataclasses.dataclass) -> dataclasses.dataclass: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure is here has a better way to write type annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can dc_type
be Type[dataclasses.dataclass]
? (I think this is required for dataclasses.fields
to work.)
Given how dynamic the code is, I think we have to go with dc: typing.Any
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right!
Have updated your advice, thank you.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2279 +/- ##
==========================================
+ Coverage 83.46% 83.49% +0.03%
==========================================
Files 324 324
Lines 24754 24757 +3
Branches 3521 3519 -2
==========================================
+ Hits 20662 20672 +10
+ Misses 3460 3455 -5
+ Partials 632 630 -2 ☔ View full report in Codecov by Sentry. |
dataclasses_json
or DataClassJSONMixin
for dataclass
cc @thomasjpfan Please take a look, thank you! |
dbdd3eb
to
bd0802e
Compare
dataclasses_json
or DataClassJSONMixin
for dataclassdataclasses_json
or DataClassJSONMixin
for dataclass
b426eea
to
5a02577
Compare
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@thomasjpfan , @Fatal1ty @task
def create_dataclasses() -> List[Datum]:
return [Datum(x=1, y="1", z={1: 1}, w=[1,1,1,1])]
@task
def concat_dataclasses(x: List[Datum], y: List[Datum]) -> List[Datum]:
return x + y
@dynamic
def dynamic_wf() -> List[Datum]:
all_dataclasses = [Datum(x=1, y="1", z={1: 1}, w=[1,1,1,1])]
for _ in range(300):
data = create_dataclasses()
all_dataclasses = concat_dataclasses(x=all_dataclasses, y=data)
return all_dataclasses
@workflow
def benchmark_workflow() -> List[Datum]:
return dynamic_wf()
if __name__ == "__main__":
import time
start_time = time.time()
benchmark_workflow()
end_time = time.time()
print(f"Time taken: {end_time - start_time}") |
Signed-off-by: Future-Outlier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. I am okay with the current scope. We can update the other parts of the codebase in follow up PRs.
flytekit/core/type_engine.py
Outdated
if not self._decoder.get(expected_python_type): | ||
self._decoder[expected_python_type] = JSONDecoder(expected_python_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regular try: except KeyError:
flytekit/core/type_engine.py
Outdated
@@ -720,7 +701,7 @@ def _fix_val_int(self, t: typing.Type, val: typing.Any) -> typing.Any: | |||
|
|||
return val | |||
|
|||
def _fix_dataclass_int(self, dc_type: Type[DataClassJsonMixin], dc: DataClassJsonMixin) -> DataClassJsonMixin: | |||
def _fix_dataclass_int(self, dc_type: Type, dc: dataclasses.dataclass) -> dataclasses.dataclass: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can dc_type
be Type[dataclasses.dataclass]
? (I think this is required for dataclasses.fields
to work.)
Given how dynamic the code is, I think we have to go with dc: typing.Any
for now.
class DatumDataclass: | ||
x: int | ||
y: Color |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another test to see what happens with typing.Union
? Specifically:
@dataclass
class DatumDataUnion:
path: typing.Union[str, os.PathLike]
If the path
started out as a pathlib.Path(...)
, does it deserialize into a string or a Path
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that Path
object is not serializable, but I will test it with other cases!
@dataclass
class DatumDataUnion(DataClassJSONMixin):
path: typing.Union[str, os.PathLike]
lt = TypeEngine.to_literal_type(DatumDataUnion)
datum_dataunion = DatumDataUnion(Path("/tmp"))
lv = transformer.to_literal(ctx, datum_dataunion, DatumDataUnion, lt)
gt = transformer.guess_python_type(lt)
pv = transformer.to_python_value(ctx, lv, expected_python_type=gt)
assert datum_dataunion.path == pv.path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that TypeTransformer can't handle support for typing.Union[str, os.PathLike] or typing.Union[str, FlyteFile].
(I thought that os.PathLike
is decided not supported here.
https://docs.flyte.org/en/latest/api/flytekit/generated/flytekit.types.file.FlyteFile.html#flytekit.types.file.FlyteFile.path
Support typing.Union[str, FlyteFile]
in flyte could be an enhancement.
For more details, I need to trace more code to find the core reason why we can support these cases now, but it definitely is not because of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thank you for checking. When I suggested typing.Union[str, os.PathLike]
, I was thinking about FlyteFile
. But FlyteFile
has it's own type transformer, so it's okay.
In any case, I'm okay with the current scope of this PR.
is this backwards compatible? serialize with an old flytekit release, with old user code. then deserialize with new flytekit and new user code. |
self._serialize_flyte_type(python_val, python_type) | ||
|
||
json_str = python_val.to_json() # type: ignore | ||
if hasattr(python_val, "to_json"): | ||
json_str = python_val.to_json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wild-endeavor With this check, we are backward compatible. If one used DataClassJsonMixin
or @dataclass_json
, then to_json
is defined and called, which matches master's behavior. XREF: https://github.com/lidatong/dataclasses-json/blob/8512afc0a87053dbde52af0519c74198fa3bb873/dataclasses_json/api.py#L26
@Future-Outlier Can you include a comment here about how this preserves backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @wild-endeavor and @thomasjpfan
In flytekit-python, we use the function to_json
to serialize dataclasses and from_json
to deserialize them. Previously, we required users in older flytekit releases to add thedataclass_json
decorator or inherit from the DataclassJsonMixin
class.
This was because both provided the necessary methods to serialize (convert a dataclass to bytes) and deserialize (convert bytes back to a dataclass).
In this PR, the mashumaro
module introduces two classes, JSONEncoder
and JSONDecoder
, for serializing and deserializing dataclasses.
These new classes eliminate the reliance on the to_json
and from_json
methods.
Initially, we used if hasattr(python_val, "to_json"):
to check for the method's presence.
Therefore, introducing these changes will not cause any breaking changes.
This means that dataclasses inheriting from DataclassJsonMixin
will continue to use to_json
and from_json
for serialization and deserialization when using this version of flytekit.
to_json REF: https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L498
from_json REF: https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L750
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I wanted to have a short comment in the code that states how the "to_json" check, helps preserves backward compatibility.
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not requiring dataclasses_json
or DataClassJSONMixin
for many use cases is already a net improvement.
LGTM
self._serialize_flyte_type(python_val, python_type) | ||
|
||
json_str = python_val.to_json() # type: ignore | ||
if hasattr(python_val, "to_json"): | ||
json_str = python_val.to_json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I wanted to have a short comment in the code that states how the "to_json" check, helps preserves backward compatibility.
class DatumDataclass: | ||
x: int | ||
y: Color |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thank you for checking. When I suggested typing.Union[str, os.PathLike]
, I was thinking about FlyteFile
. But FlyteFile
has it's own type transformer, so it's okay.
In any case, I'm okay with the current scope of this PR.
Signed-off-by: Future-Outlier <[email protected]>
Added comments, thank you so much |
Signed-off-by: Future-Outlier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…xin` for dataclass (flyteorg#2279) Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]>
…xin` for dataclass (#2279) Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Signed-off-by: Jan Fiedler <[email protected]>
Tracking issue
flyteorg/flyte#4486
Why are the changes needed?
For a better user experience.
What changes were proposed in this pull request?
use
mashumaro>=3.11
, so that we can useJSONEncoder
andJSONDecoder
changepython_val.to_json()
toJSONEncoder(python_type).encode(python_val)
changeexpected_python_type.from_json(json_str)
toJSONDecoder(expected_python_type).decode(json_str)
change
return dataclass_json(dataclasses.make_dataclass(schema_name, attribute_list))
toreturn dataclasses.make_dataclass(schema_name, attribute_list)
,since we don't need
to_json
method andfrom_json
method anymore.add tests
fix mypy errors
change type annotations
removeflytekit-dolt
from CI test, since it doesn't work now and needs to be implemented a new version.use
JSONEncoder
andJSONDecoder
to convertdataclass
tojson str
and convertjson str
todataclass
whenthe user didn't use
dataclasses_json
andDataClassJSONMixin
.add an encoder registry and a decoder registry to cache
JSONEncoder
andJSONDecoder
when usingList[dataclass]
add a benchmark test in real case scenario by dynamic workflow and return
List[dataclass]
How was this patch tested?
dataclass
decoratorDataClassJSONMixin
. (for backward compatible)dataclass_json
decorator. (for backward compatible)Note: you can use
futureoutlier/dataclass:0321
this image to test it.Setup process
Screenshots
local execution (with only dataclass decorator)
local execution (with
DataClassJSONMixin
)remote execution (with only dataclass decorator)
remote execution (with
DataClassJSONMixin
)remote execution (with
dataclass_json
)Check all the applicable boxes