-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary IDL With MessagePack #2760
Conversation
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-2
Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-3
Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-2 Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-3
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-3
Signed-off-by: Future-Outlier <[email protected]>
… With MessagePack Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
assert fh.read() == "Hello FlyteDirectory" | ||
assert inner_dc.o.downloaded | ||
print("Test InnerDC Successfully Passed") | ||
# enum: Status | ||
assert inner_dc.enum_status == Status.PENDING | ||
|
||
def t_test_all_attributes(a: int, b: float, c: str, d: bool, e: List[int], f: List[FlyteFile], g: List[List[int]], | ||
h: List[Dict[int, bool]], i: Dict[int, bool], j: Dict[int, FlyteFile], | ||
k: Dict[int, List[int]], l: Dict[int, Dict[int, int]], m: dict, | ||
n: FlyteFile, o: FlyteDirectory, enum_status: Status): | ||
# Strict type checks for simple types | ||
assert isinstance(a, int), f"a is not int, it's {type(a)}" | ||
assert a == -1 | ||
assert isinstance(b, float), f"b is not float, it's {type(b)}" | ||
assert isinstance(c, str), f"c is not str, it's {type(c)}" | ||
assert isinstance(d, bool), f"d is not bool, it's {type(d)}" | ||
|
||
# Strict type checks for List[int] | ||
assert isinstance(e, list) and all(isinstance(i, int) for i in e), "e is not List[int]" | ||
|
||
# Strict type checks for List[FlyteFile] | ||
assert isinstance(f, list) and all(isinstance(i, FlyteFile) for i in f), "f is not List[FlyteFile]" | ||
|
||
# Strict type checks for List[List[int]] | ||
assert isinstance(g, list) and all( | ||
isinstance(i, list) and all(isinstance(j, int) for j in i) for i in g), "g is not List[List[int]]" | ||
|
||
# Strict type checks for List[Dict[int, bool]] | ||
assert isinstance(h, list) and all( | ||
isinstance(i, dict) and all(isinstance(k, int) and isinstance(v, bool) for k, v in i.items()) for i in h | ||
), "h is not List[Dict[int, bool]]" | ||
|
||
# Strict type checks for Dict[int, bool] | ||
assert isinstance(i, dict) and all( | ||
isinstance(k, int) and isinstance(v, bool) for k, v in i.items()), "i is not Dict[int, bool]" | ||
|
||
# Strict type checks for Dict[int, FlyteFile] | ||
assert isinstance(j, dict) and all( | ||
isinstance(k, int) and isinstance(v, FlyteFile) for k, v in j.items()), "j is not Dict[int, FlyteFile]" | ||
|
||
# Strict type checks for Dict[int, List[int]] | ||
assert isinstance(k, dict) and all( | ||
isinstance(k, int) and isinstance(v, list) and all(isinstance(i, int) for i in v) for k, v in | ||
k.items()), "k is not Dict[int, List[int]]" | ||
|
||
# Strict type checks for Dict[int, Dict[int, int]] | ||
assert isinstance(l, dict) and all( | ||
isinstance(k, int) and isinstance(v, dict) and all( | ||
isinstance(sub_k, int) and isinstance(sub_v, int) for sub_k, sub_v in v.items()) | ||
for k, v in l.items()), "l is not Dict[int, Dict[int, int]]" | ||
|
||
# Strict type check for a generic dict | ||
assert isinstance(m, dict), "m is not dict" | ||
|
||
# Strict type check for FlyteFile | ||
assert isinstance(n, FlyteFile), "n is not FlyteFile" | ||
|
||
# Strict type check for FlyteDirectory | ||
assert isinstance(o, FlyteDirectory), "o is not FlyteDirectory" | ||
|
||
# Strict type check for Enum | ||
assert isinstance(enum_status, Status), "enum_status is not Status" | ||
|
||
print("All attributes passed strict type checks.") | ||
|
||
# This is the old dataclass serialization behavior. | ||
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L702-L728 | ||
dc = DC() | ||
DataclassTransformer()._make_dataclass_serializable(python_val=dc, python_type=DC) | ||
json_str = JSONEncoder(DC).encode(dc) | ||
upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json_str, _struct.Struct()))) | ||
|
||
downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, DC) | ||
t_inner(downstream_input.inner_dc) | ||
t_test_all_attributes(a=downstream_input.a, b=downstream_input.b, c=downstream_input.c, | ||
d=downstream_input.d, e=downstream_input.e, f=downstream_input.f, | ||
g=downstream_input.g, h=downstream_input.h, i=downstream_input.i, | ||
j=downstream_input.j, k=downstream_input.k, l=downstream_input.l, | ||
m=downstream_input.m, n=downstream_input.n, o=downstream_input.o, | ||
enum_status=downstream_input.enum_status) | ||
t_test_all_attributes(a=downstream_input.inner_dc.a, b=downstream_input.inner_dc.b, c=downstream_input.inner_dc.c, | ||
d=downstream_input.inner_dc.d, e=downstream_input.inner_dc.e, f=downstream_input.inner_dc.f, | ||
g=downstream_input.inner_dc.g, h=downstream_input.inner_dc.h, i=downstream_input.inner_dc.i, | ||
j=downstream_input.inner_dc.j, k=downstream_input.inner_dc.k, l=downstream_input.inner_dc.l, | ||
m=downstream_input.inner_dc.m, n=downstream_input.inner_dc.n, o=downstream_input.inner_dc.o, | ||
enum_status=downstream_input.inner_dc.enum_status) | ||
|
||
def test_backward_compatible_with_untyped_dict_in_protobuf_struct(): | ||
# This is the old dataclass serialization behavior. | ||
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L1699-L1720 | ||
dict_input = {"a" : 1.0, "b": "str", | ||
"c": False, "d": True, | ||
"e": [1.0, 2.0, -1.0, 0.0], | ||
"f": {"a": {"b": [1.0, -1.0]}}} | ||
|
||
upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json.dumps(dict_input), _struct.Struct())), | ||
metadata={"format": "json"}) | ||
|
||
downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, dict) | ||
assert dict_input == downstream_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for backward-compatible testing.
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]>
Hi, @wild-endeavor @pingsutw @eapolinario |
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2760 +/- ##
==========================================
+ Coverage 80.07% 83.85% +3.77%
==========================================
Files 280 3 -277
Lines 23491 161 -23330
Branches 4146 0 -4146
==========================================
- Hits 18811 135 -18676
+ Misses 3989 26 -3963
+ Partials 691 0 -691
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/flytekit/unit/core/test_type_engine_binary_idl.py might be my favorite set of unit tests in all of flytekit.
@@ -35,6 +35,7 @@ dependencies = [ | |||
"marshmallow-enum", | |||
"marshmallow-jsonschema>=0.12.0", | |||
"mashumaro>=3.11", | |||
"msgpack>=1.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we have to be so strict? 1.1.0 was released on 9/10/24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't need, but the newer it is, the it can fix more bug in msgpack
def test_backward_compatible_with_untyped_dict_in_protobuf_struct(): | ||
# This is the old dataclass serialization behavior. | ||
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L1699-L1720 | ||
dict_input = {"a" : 1.0, "b": "str", | ||
"c": False, "d": True, | ||
"e": [1.0, 2.0, -1.0, 0.0], | ||
"f": {"a": {"b": [1.0, -1.0]}}} | ||
|
||
upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json.dumps(dict_input), _struct.Struct())), | ||
metadata={"format": "json"}) | ||
|
||
downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, dict) | ||
assert dict_input == downstream_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
This case can be supported, but we don't have backward compatible issue now, since no users report about this before. # Status is an Enum class
@dataclass
class DC:
grid: Dict[str, List[Optional[Union[int, str, float, bool, Status, InnerDC]]]] = field(default_factory=lambda: {
'all_types': [InnerDC()],
}) Mashumaro Issue: Fatal1ty/mashumaro#252 |
Signed-off-by: Future-Outlier <[email protected]>
* [flytekit][1][Simple Type] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * Add Tests Signed-off-by: Future-Outlier <[email protected]> * remove unused import Signed-off-by: Future-Outlier <[email protected]> * [flytekit][2][untyped dict] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * Fix Tests Signed-off-by: Future-Outlier <[email protected]> * [Flyte][3][Attribute Access] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * fix test_offloaded_literal Signed-off-by: Future-Outlier <[email protected]> * Add more tests Signed-off-by: Future-Outlier <[email protected]> * add tests for more complex cases Signed-off-by: Future-Outlier <[email protected]> * turn {} to dict() Signed-off-by: Future-Outlier <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> * [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * fix all tests, and support flytetypes and union from binary idl Signed-off-by: Future-Outlier <[email protected]> * self._encoder: Dict[Type, JSONEncoder] Signed-off-by: Future-Outlier <[email protected]> * fix lint Signed-off-by: Future-Outlier <[email protected]> * better comments Signed-off-by: Future-Outlier <[email protected]> * support enum transformer Signed-off-by: Future-Outlier <[email protected]> * add test_flytefile_in_dataclass_wf Signed-off-by: Future-Outlier <[email protected]> * add tests Signed-off-by: Future-Outlier <[email protected]> * Test Backward Compatible Signed-off-by: Future-Outlier <[email protected]> * add type transformer failed error Signed-off-by: Future-Outlier <[email protected]> * Update pingsu's review advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]> * update pingsu's review advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]> * update dict and list test with dataclass Signed-off-by: Future-Outlier <[email protected]> * ruff Signed-off-by: Future-Outlier <[email protected]> * support Dict[int, int] as input in workflow, including attribute access Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> * Add flytekit.bin.entrypoint to __init__.py for auto copy bug Signed-off-by: Future-Outlier <[email protected]> * revert back Signed-off-by: Future-Outlier <[email protected]> * add tests for union in dataclass, nested case Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]>
* [flytekit][1][Simple Type] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * Add Tests Signed-off-by: Future-Outlier <[email protected]> * remove unused import Signed-off-by: Future-Outlier <[email protected]> * [flytekit][2][untyped dict] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * Fix Tests Signed-off-by: Future-Outlier <[email protected]> * [Flyte][3][Attribute Access] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * fix test_offloaded_literal Signed-off-by: Future-Outlier <[email protected]> * Add more tests Signed-off-by: Future-Outlier <[email protected]> * add tests for more complex cases Signed-off-by: Future-Outlier <[email protected]> * turn {} to dict() Signed-off-by: Future-Outlier <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> * [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack Signed-off-by: Future-Outlier <[email protected]> * fix all tests, and support flytetypes and union from binary idl Signed-off-by: Future-Outlier <[email protected]> * self._encoder: Dict[Type, JSONEncoder] Signed-off-by: Future-Outlier <[email protected]> * fix lint Signed-off-by: Future-Outlier <[email protected]> * better comments Signed-off-by: Future-Outlier <[email protected]> * support enum transformer Signed-off-by: Future-Outlier <[email protected]> * add test_flytefile_in_dataclass_wf Signed-off-by: Future-Outlier <[email protected]> * add tests Signed-off-by: Future-Outlier <[email protected]> * Test Backward Compatible Signed-off-by: Future-Outlier <[email protected]> * add type transformer failed error Signed-off-by: Future-Outlier <[email protected]> * Update pingsu's review advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]> * update pingsu's review advice Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]> * update dict and list test with dataclass Signed-off-by: Future-Outlier <[email protected]> * ruff Signed-off-by: Future-Outlier <[email protected]> * support Dict[int, int] as input in workflow, including attribute access Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> * Add flytekit.bin.entrypoint to __init__.py for auto copy bug Signed-off-by: Future-Outlier <[email protected]> * revert back Signed-off-by: Future-Outlier <[email protected]> * add tests for union in dataclass, nested case Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pingsutw <[email protected]>
Tracking Issue
flyteorg/flyte#5318
Why Are the Changes Needed?
To support dataclass, FlyteTypes, and union type attribute access.
What Changes Are Proposed in This Pull Request?
Mashumaro
1. Why
mashumaro.codecs.msgpack.Decoder
?Summary: It can make our deserialized value 100% type correct.
datetime
is not supported natively in msgpack, but with Mashumaro,datetime
will be converted to a string and serialized as msgpack bytes.Example:
During deserialization, if we use
msgpack.loads
,datetime
will be deserialized as a string. However, withMessagePackDecoder[expected_python_type].decode()
, it will be correctly deserialized todatetime
.Similar benefits apply for
int
andfloat
types:When you input from FlyteConsole, the input value of
a
is treated as anumber
in JavaScript. During deserialization,msgpack.loads
will convert it to either anint
orfloat
depending on whether it has a decimal point. However, usingMessagePackDecoder[expected_python_type].decode()
ensures it converts to the correct type 100% of the time.References:
2. Why
_default_flytekit_decoder
?Example without the
_default_flytekit_decoder
:To fix this, we set
strict_map_key=False
, and it decodes correctly:3. Why
mashumaro.codecs.msgpack.Encoder
?For dataclasses:
Reuse
SerializableType
for FlyteTypes:Both
MessagePackEncoder
/MessagePackDecoder
andJSONEncoder
/JSONDecoder
can useSerializableType
to customize serialization and deserialization behavior.Reference PR: Override Dataclass Serialization/Deserialization Behavior for
FlyteTypes
bymashumaro
#2554No need to convert a dataclass to a
dict
and then to msgpack bytes. With Mashumaro, it's hidden in the API, converting directly to msgpack bytes.4. The Lifecycle of the Dataclass in the Flyte Type System:
Serialization:
Common cases:
dataclass -> msgpack bytes -> Binary IDL Object
Discriminated classes:
dataclass -> json -> dict -> msgpack bytes -> Binary IDL Object
Reference PR: Re-adding support for mashumaro discriminated classes #2613
Deserialization:
Common cases:
Binary IDL Object -> msgpack bytes -> dataclass
Discriminated classes:
Binary IDL Object -> msgpack bytes -> dict -> json -> dataclass
Reference PR: Re-adding support for mashumaro discriminated classes #2613
Convert Binary IDL to Python Value
1. When Will We Need It?
(1) When accessing attributes in a dataclass within a workflow:
Types to handle:
(2) When deserializing a Binary IDL to a Python value using the dataclass transformer:
2. Customize
from_binary_idl
Function:(1) FlyteTypes (e.g.,
FlyteFile
,FlyteDirectory
,FlyteSchema
, andStructuredDataset
):For FlyteTypes in a dataclass, we convert them to a dictionary with necessary data. For example,
FlyteFile(path="s3://...")
will be converted to a dictionary{"path": "s3://..."}
. When converting back to a Python value, we useFlyteFilePathTransformer.to_python_val
to retrieve and convert thepath
.Reference PR: #2554
(2) Dataclass:
When deserializing a Binary IDL Object (generated from a dataclass
to_literal
or dataclass attribute access), we handle common cases and discriminated classes as described above.3. General
from_binary_idl
Function:Other cases will use the
TypeTransformer
'sfrom_binary_idl
function, which can handle all types except for the special cases in points 1 and 2.How was this patch tested?
unit tests, local execution and remote execution.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link