Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary IDL With MessagePack #2760

Merged
merged 38 commits into from
Oct 4, 2024
Merged

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Sep 19, 2024

Tracking Issue

flyteorg/flyte#5318

Why Are the Changes Needed?

To support dataclass, FlyteTypes, and union type attribute access.

What Changes Are Proposed in This Pull Request?

Mashumaro

1. Why mashumaro.codecs.msgpack.Decoder?

Summary: It can make our deserialized value 100% type correct.

decoder = MessagePackDecoder(expected_python_type, pre_decoder_func=_default_flytekit_decoder)
python_val = decoder.decode(binary_idl_object.value)  # msgpack bytes as input

datetime is not supported natively in msgpack, but with Mashumaro, datetime will be converted to a string and serialized as msgpack bytes.

Example:

@dataclass
class DC:
    a: datetime.datetime

@task
def t_datetime(a: datetime.datetime):
    print(a, type(a))

@workflow
def wf(dc: DC):
    t_datetime(dc.a)

During deserialization, if we use msgpack.loads, datetime will be deserialized as a string. However, with MessagePackDecoder[expected_python_type].decode(), it will be correctly deserialized to datetime.

Similar benefits apply for int and float types:

@dataclass
class DC:
    a: int

@task
def t_int(a: int):
    print(a, type(a))

@workflow
def wf(dc: DC):
    t_int(dc.a)

When you input from FlyteConsole, the input value of a is treated as a number in JavaScript. During deserialization, msgpack.loads will convert it to either an int or float depending on whether it has a decimal point. However, using MessagePackDecoder[expected_python_type].decode() ensures it converts to the correct type 100% of the time.

References:

2. Why _default_flytekit_decoder?

Example without the _default_flytekit_decoder:

@dataclass
class DC:
    a: dict[int, str]

from mashumaro.codecs.msgpack import MessagePackEncoder, MessagePackDecoder
encoder = MessagePackEncoder(DC)
decoder = MessagePackDecoder(DC)

dc = DC(a={1: "a", 2: "b"})
msgpack_bytes = encoder.encode(dc)  # b'\x81\xa1a\x82\x01\xa1a\x02\xa1b'
dc = decoder.decode(msgpack_bytes)  # ValueError: int is not allowed for map key when strict_map_key=True

To fix this, we set strict_map_key=False, and it decodes correctly:

@dataclass
class DC:
    a: dict[int, str]

from mashumaro.codecs.msgpack import MessagePackEncoder, MessagePackDecoder

def _default_flytekit_decoder(data: bytes) -> typing.Any:
    return msgpack.unpackb(data, raw=False, strict_map_key=False)

encoder = MessagePackEncoder(DC)
decoder = MessagePackDecoder(DC, pre_decoder_func=_default_flytekit_decoder)

dc = DC(a={1: "a", 2: "b"})
msgpack_bytes = encoder.encode(dc)  # b'\x81\xa1a\x82\x01\xa1a\x02\xa1b'
dc = decoder.decode(msgpack_bytes)  # DC(a={1: 'a', 2: 'b'})

3. Why mashumaro.codecs.msgpack.Encoder?

For dataclasses:

  1. Reuse SerializableType for FlyteTypes:
    Both MessagePackEncoder/MessagePackDecoder and JSONEncoder/JSONDecoder can use SerializableType to customize serialization and deserialization behavior.

    Reference PR: Override Dataclass Serialization/Deserialization Behavior for FlyteTypes by mashumaro #2554

  2. No need to convert a dataclass to a dict and then to msgpack bytes. With Mashumaro, it's hidden in the API, converting directly to msgpack bytes.

4. The Lifecycle of the Dataclass in the Flyte Type System:

Serialization:

Deserialization:


Convert Binary IDL to Python Value

1. When Will We Need It?

(1) When accessing attributes in a dataclass within a workflow:

@workflow
def wf(dc: DC):
    t_primitive(input=dc.a)

Types to handle:

  • Primitive types
  • Enums
  • Untyped dictionaries
  • Lists, dictionaries, and nested types
  • Pure dataclasses and nested dataclasses
  • FlyteTypes
  • Optional/Union types

(2) When deserializing a Binary IDL to a Python value using the dataclass transformer:

@task
def t_dc(dc: DC):
    pass

@workflow
def wf(dc: DC):
    t_dc(dc=dc)

2. Customize from_binary_idl Function:

(1) FlyteTypes (e.g., FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset):
For FlyteTypes in a dataclass, we convert them to a dictionary with necessary data. For example, FlyteFile(path="s3://...") will be converted to a dictionary {"path": "s3://..."}. When converting back to a Python value, we use FlyteFilePathTransformer.to_python_val to retrieve and convert the path.

Reference PR: #2554

(2) Dataclass:
When deserializing a Binary IDL Object (generated from a dataclass to_literal or dataclass attribute access), we handle common cases and discriminated classes as described above.

3. General from_binary_idl Function:

Other cases will use the TypeTransformer's from_binary_idl function, which can handle all types except for the special cases in points 1 and 2.

How was this patch tested?

unit tests, local execution and remote execution.

import typing
import os
from dataclasses import dataclass, fields, field
from typing import Dict, List
from flytekit.types.file import FlyteFile
from flytekit.types.directory import FlyteDirectory
from flytekit.models.literals import StructuredDataset
from flytekit.types.schema import FlyteSchema

from mashumaro.mixins.json import DataClassJSONMixin
from flytekit import task, workflow, ImageSpec
from dataclasses_json import dataclass_json
import datetime
from enum import Enum



flytekit_hash = "3c652a585032684ebe8144be16f42ad7a7ccc8d7"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"
image = ImageSpec(
    packages=[flytekit],
    apt_packages=["git"],
    registry="localhost:30000",
)

class Status(Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"

@dataclass
class InnerDC(DataClassJSONMixin):
# class InnerDC:
    a: int = -1
    b: float = 2.1
    c: str = "Hello, Flyte"
    d: bool = False
    e: List[int] = field(default_factory=lambda: [0, 1, 2, -1, -2])
    f: List[FlyteFile] = field(default_factory=lambda: [FlyteFile("s3://my-s3-bucket/example.txt"),])
    g: List[List[int]] = field(default_factory=lambda: [[0], [1], [-1]])
    h: List[Dict[int, bool]] = field(default_factory=lambda: [{0: False}, {1: True}, {-1: True}])
    i: Dict[int, bool] = field(default_factory=lambda: {0: False, 1: True, -1: False})
    j: Dict[int, FlyteFile] = field(default_factory=lambda: {0: FlyteFile("s3://my-s3-bucket/example.txt"),
                                                             1: FlyteFile("s3://my-s3-bucket/example.txt"),
                                                             -1: FlyteFile("s3://my-s3-bucket/example.txt")})
    k: Dict[int, List[int]] = field(default_factory=lambda: {0: [0, 1, -1]})
    l: Dict[int, Dict[int, int]] = field(default_factory=lambda: {1: {-1: 0}})
    m: dict = field(default_factory=lambda: {"key": "value"})
    n: FlyteFile = field(default_factory=lambda: FlyteFile("s3://my-s3-bucket/example.txt"))
    o: FlyteDirectory = field(default_factory=lambda: FlyteDirectory("s3://my-s3-bucket/s3_flyte_dir"))
    enum_status: Status = field(default=Status.PENDING)

@dataclass
class DC(DataClassJSONMixin):
# class DC:
    a: int = -1
    b: float = 2.1
    c: str = "Hello, Flyte"
    d: bool = False
    e: List[int] = field(default_factory=lambda: [0, 1, 2, -1, -2])
    f: List[FlyteFile] = field(default_factory=lambda: [FlyteFile("s3://my-s3-bucket/example.txt"), ])
    g: List[List[int]] = field(default_factory=lambda: [[0], [1], [-1]])
    h: List[Dict[int, bool]] = field(default_factory=lambda: [{0: False}, {1: True}, {-1: True}])
    i: Dict[int, bool] = field(default_factory=lambda: {0: False, 1: True, -1: False})
    j: Dict[int, FlyteFile] = field(default_factory=lambda: {0: FlyteFile("s3://my-s3-bucket/example.txt"),
                                                             1: FlyteFile("s3://my-s3-bucket/example.txt"),
                                                             -1: FlyteFile("s3://my-s3-bucket/example.txt")})
    k: Dict[int, List[int]] = field(default_factory=lambda: {0: [0, 1, -1]})
    l: Dict[int, Dict[int, int]] = field(default_factory=lambda: {1: {-1: 0}})
    m: dict = field(default_factory=lambda: {"key": "value"})
    n: FlyteFile = field(default_factory=lambda: FlyteFile("s3://my-s3-bucket/example.txt"))
    o: FlyteDirectory = field(default_factory=lambda: FlyteDirectory("s3://my-s3-bucket/s3_flyte_dir"))
    inner_dc: InnerDC = field(default_factory=lambda: InnerDC())
    enum_status: Status = field(default=Status.PENDING)


@task(container_image=image)
def t_inner(inner_dc: InnerDC):
    assert(type(inner_dc), InnerDC)

    expected_file_content = "Default content"

    # f: List[FlyteFile]
    for ff in inner_dc.f:
        assert(type(ff), FlyteFile)
        with open(ff, "r") as f:
            assert f.read() == expected_file_content
    # j: Dict[int, FlyteFile]
    for _, ff in inner_dc.j.items():
        assert(type(ff), FlyteFile)
        with open(ff, "r") as f:
            assert f.read() == expected_file_content
    # n: FlyteFile
    assert(type(inner_dc.n), FlyteFile)
    with open(inner_dc.n, "r") as f:
        assert f.read() == expected_file_content
    # o: FlyteDirectory
    assert(type(inner_dc.o), FlyteDirectory)
    assert not inner_dc.o.downloaded
    with open(os.path.join(inner_dc.o, "example.txt"), "r") as fh:
        assert fh.read() == expected_file_content
    assert inner_dc.o.downloaded
    print("Test InnerDC Successfully Passed")
    # enum: Status
    assert inner_dc.enum_status == Status.PENDING



@task(container_image=image)
def t_test_all_attributes(a: int, b: float, c: str, d: bool, e: List[int], f: List[FlyteFile], g: List[List[int]],
                          h: List[Dict[int, bool]], i: Dict[int, bool], j: Dict[int, FlyteFile],
                          k: Dict[int, List[int]], l: Dict[int, Dict[int, int]], m: dict,
                          n: FlyteFile, o: FlyteDirectory, enum_status: Status):
    # Strict type checks for simple types
    assert isinstance(a, int), f"a is not int, it's {type(a)}"
    assert a == -1
    assert isinstance(b, float), f"b is not float, it's {type(b)}"
    assert isinstance(c, str), f"c is not str, it's {type(c)}"
    assert isinstance(d, bool), f"d is not bool, it's {type(d)}"

    # Strict type checks for List[int]
    assert isinstance(e, list) and all(isinstance(i, int) for i in e), "e is not List[int]"

    # Strict type checks for List[FlyteFile]
    assert isinstance(f, list) and all(isinstance(i, FlyteFile) for i in f), "f is not List[FlyteFile]"

    # Strict type checks for List[List[int]]
    assert isinstance(g, list) and all(isinstance(i, list) and all(isinstance(j, int) for j in i) for i in g), "g is not List[List[int]]"

    # Strict type checks for List[Dict[int, bool]]
    assert isinstance(h, list) and all(
        isinstance(i, dict) and all(isinstance(k, int) and isinstance(v, bool) for k, v in i.items()) for i in h
    ), "h is not List[Dict[int, bool]]"

    # Strict type checks for Dict[int, bool]
    assert isinstance(i, dict) and all(
        isinstance(k, int) and isinstance(v, bool) for k, v in i.items()), "i is not Dict[int, bool]"

    # Strict type checks for Dict[int, FlyteFile]
    assert isinstance(j, dict) and all(
        isinstance(k, int) and isinstance(v, FlyteFile) for k, v in j.items()), "j is not Dict[int, FlyteFile]"

    # Strict type checks for Dict[int, List[int]]
    assert isinstance(k, dict) and all(
        isinstance(k, int) and isinstance(v, list) and all(isinstance(i, int) for i in v) for k, v in k.items()), "k is not Dict[int, List[int]]"

    # Strict type checks for Dict[int, Dict[int, int]]
    assert isinstance(l, dict) and all(
        isinstance(k, int) and isinstance(v, dict) and all(isinstance(sub_k, int) and isinstance(sub_v, int) for sub_k, sub_v in v.items())
        for k, v in l.items()), "l is not Dict[int, Dict[int, int]]"

    # Strict type check for a generic dict
    assert isinstance(m, dict), "m is not dict"

    # Strict type check for FlyteFile
    assert isinstance(n, FlyteFile), "n is not FlyteFile"

    # Strict type check for FlyteDirectory
    assert isinstance(o, FlyteDirectory), "o is not FlyteDirectory"

    # Strict type check for Enum
    assert isinstance(enum_status, Status), "enum_status is not Status"

    print("All attributes passed strict type checks.")

@workflow
def wf(dc: DC):
    t_inner(dc.inner_dc)
    t_test_all_attributes(a=dc.a, b=dc.b, c=dc.c,
                            d=dc.d, e=dc.e, f=dc.f,
                            g=dc.g, h=dc.h, i=dc.i,
                            j=dc.j, k=dc.k, l=dc.l,
                            m=dc.m, n=dc.n, o=dc.o, enum_status=dc.enum_status)

    t_test_all_attributes(a=dc.inner_dc.a, b=dc.inner_dc.b, c=dc.inner_dc.c,
                          d=dc.inner_dc.d, e=dc.inner_dc.e, f=dc.inner_dc.f,
                          g=dc.inner_dc.g, h=dc.inner_dc.h, i=dc.inner_dc.i,
                          j=dc.inner_dc.j, k=dc.inner_dc.k, l=dc.inner_dc.l,
                          m=dc.inner_dc.m, n=dc.inner_dc.n, o=dc.inner_dc.o, enum_status=dc.inner_dc.enum_status)


if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner
    import os

    runner = CliRunner()
    path = os.path.realpath(__file__)
    input_val = '{"a": -1, "b": 3.14}'
    result = runner.invoke(pyflyte.main,
                           ["run", path, "wf", "--dc", input_val])
    print("Local Execution: ", result.output)
    #
    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path, "wf", "--dc", input_val])
    print("Remote Execution: ", result.output)

Setup process

Screenshots

  • local execution
(dev) future@outlier ~ % python build/PR/JSON/stacked_PRs/dataclass_all.py
Local Execution:  Running Execution on local.
Test InnerDC Successfully Passed
All attributes passed strict type checks.
All attributes passed strict type checks.


WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1727068444.257470 5921980 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
I0000 00:00:1727068444.311672 5921980 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
Remote Execution:  Running Execution on Remote.
Image localhost:30000/flytekit:HhKHwxYiER_ZLNNGsgaA0A found. Skip building.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/acxlvpq5fpnfnkc7scjf to see execution in the console.
  • remote execution
image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
…ith-message-pack-bytes-2

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier changed the title [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack [wip][flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack Sep 19, 2024
@Future-Outlier Future-Outlier changed the title [wip][flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack Sep 20, 2024
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier marked this pull request as ready for review September 23, 2024 16:18
Comment on lines 528 to 697
assert fh.read() == "Hello FlyteDirectory"
assert inner_dc.o.downloaded
print("Test InnerDC Successfully Passed")
# enum: Status
assert inner_dc.enum_status == Status.PENDING

def t_test_all_attributes(a: int, b: float, c: str, d: bool, e: List[int], f: List[FlyteFile], g: List[List[int]],
h: List[Dict[int, bool]], i: Dict[int, bool], j: Dict[int, FlyteFile],
k: Dict[int, List[int]], l: Dict[int, Dict[int, int]], m: dict,
n: FlyteFile, o: FlyteDirectory, enum_status: Status):
# Strict type checks for simple types
assert isinstance(a, int), f"a is not int, it's {type(a)}"
assert a == -1
assert isinstance(b, float), f"b is not float, it's {type(b)}"
assert isinstance(c, str), f"c is not str, it's {type(c)}"
assert isinstance(d, bool), f"d is not bool, it's {type(d)}"

# Strict type checks for List[int]
assert isinstance(e, list) and all(isinstance(i, int) for i in e), "e is not List[int]"

# Strict type checks for List[FlyteFile]
assert isinstance(f, list) and all(isinstance(i, FlyteFile) for i in f), "f is not List[FlyteFile]"

# Strict type checks for List[List[int]]
assert isinstance(g, list) and all(
isinstance(i, list) and all(isinstance(j, int) for j in i) for i in g), "g is not List[List[int]]"

# Strict type checks for List[Dict[int, bool]]
assert isinstance(h, list) and all(
isinstance(i, dict) and all(isinstance(k, int) and isinstance(v, bool) for k, v in i.items()) for i in h
), "h is not List[Dict[int, bool]]"

# Strict type checks for Dict[int, bool]
assert isinstance(i, dict) and all(
isinstance(k, int) and isinstance(v, bool) for k, v in i.items()), "i is not Dict[int, bool]"

# Strict type checks for Dict[int, FlyteFile]
assert isinstance(j, dict) and all(
isinstance(k, int) and isinstance(v, FlyteFile) for k, v in j.items()), "j is not Dict[int, FlyteFile]"

# Strict type checks for Dict[int, List[int]]
assert isinstance(k, dict) and all(
isinstance(k, int) and isinstance(v, list) and all(isinstance(i, int) for i in v) for k, v in
k.items()), "k is not Dict[int, List[int]]"

# Strict type checks for Dict[int, Dict[int, int]]
assert isinstance(l, dict) and all(
isinstance(k, int) and isinstance(v, dict) and all(
isinstance(sub_k, int) and isinstance(sub_v, int) for sub_k, sub_v in v.items())
for k, v in l.items()), "l is not Dict[int, Dict[int, int]]"

# Strict type check for a generic dict
assert isinstance(m, dict), "m is not dict"

# Strict type check for FlyteFile
assert isinstance(n, FlyteFile), "n is not FlyteFile"

# Strict type check for FlyteDirectory
assert isinstance(o, FlyteDirectory), "o is not FlyteDirectory"

# Strict type check for Enum
assert isinstance(enum_status, Status), "enum_status is not Status"

print("All attributes passed strict type checks.")

# This is the old dataclass serialization behavior.
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L702-L728
dc = DC()
DataclassTransformer()._make_dataclass_serializable(python_val=dc, python_type=DC)
json_str = JSONEncoder(DC).encode(dc)
upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json_str, _struct.Struct())))

downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, DC)
t_inner(downstream_input.inner_dc)
t_test_all_attributes(a=downstream_input.a, b=downstream_input.b, c=downstream_input.c,
d=downstream_input.d, e=downstream_input.e, f=downstream_input.f,
g=downstream_input.g, h=downstream_input.h, i=downstream_input.i,
j=downstream_input.j, k=downstream_input.k, l=downstream_input.l,
m=downstream_input.m, n=downstream_input.n, o=downstream_input.o,
enum_status=downstream_input.enum_status)
t_test_all_attributes(a=downstream_input.inner_dc.a, b=downstream_input.inner_dc.b, c=downstream_input.inner_dc.c,
d=downstream_input.inner_dc.d, e=downstream_input.inner_dc.e, f=downstream_input.inner_dc.f,
g=downstream_input.inner_dc.g, h=downstream_input.inner_dc.h, i=downstream_input.inner_dc.i,
j=downstream_input.inner_dc.j, k=downstream_input.inner_dc.k, l=downstream_input.inner_dc.l,
m=downstream_input.inner_dc.m, n=downstream_input.inner_dc.n, o=downstream_input.inner_dc.o,
enum_status=downstream_input.inner_dc.enum_status)

def test_backward_compatible_with_untyped_dict_in_protobuf_struct():
# This is the old dataclass serialization behavior.
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L1699-L1720
dict_input = {"a" : 1.0, "b": "str",
"c": False, "d": True,
"e": [1.0, 2.0, -1.0, 0.0],
"f": {"a": {"b": [1.0, -1.0]}}}

upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json.dumps(dict_input), _struct.Struct())),
metadata={"format": "json"})

downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, dict)
assert dict_input == downstream_input
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for backward-compatible testing.

Future-Outlier and others added 2 commits September 24, 2024 00:25
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>
@Future-Outlier
Copy link
Member Author

Future-Outlier commented Sep 24, 2024

Hi, @wild-endeavor @pingsutw @eapolinario
Since the code change is not a lot, more than 70% of the code are unit tests and integration tests.
Let's all review this PR here, thank you.

flytekit/core/promise.py Outdated Show resolved Hide resolved
flytekit/types/directory/types.py Outdated Show resolved Hide resolved
flytekit/types/file/file.py Outdated Show resolved Hide resolved
flytekit/core/type_engine.py Outdated Show resolved Hide resolved
Future-Outlier and others added 4 commits September 25, 2024 09:54
Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier changed the title [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack Binary IDL With MessagePack Sep 26, 2024
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.85%. Comparing base (cc4d27b) to head (ab36e49).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2760      +/-   ##
==========================================
+ Coverage   80.07%   83.85%   +3.77%     
==========================================
  Files         280        3     -277     
  Lines       23491      161   -23330     
  Branches     4146        0    -4146     
==========================================
- Hits        18811      135   -18676     
+ Misses       3989       26    -3963     
+ Partials      691        0     -691     
Flag Coverage Δ
?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/flytekit/unit/core/test_type_engine_binary_idl.py might be my favorite set of unit tests in all of flytekit.

@@ -35,6 +35,7 @@ dependencies = [
"marshmallow-enum",
"marshmallow-jsonschema>=0.12.0",
"mashumaro>=3.11",
"msgpack>=1.1.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we have to be so strict? 1.1.0 was released on 9/10/24.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't need, but the newer it is, the it can fix more bug in msgpack

Comment on lines +865 to +877
def test_backward_compatible_with_untyped_dict_in_protobuf_struct():
# This is the old dataclass serialization behavior.
# https://github.com/flyteorg/flytekit/blob/94786cfd4a5c2c3b23ac29dcd6f04d0553fa1beb/flytekit/core/type_engine.py#L1699-L1720
dict_input = {"a" : 1.0, "b": "str",
"c": False, "d": True,
"e": [1.0, 2.0, -1.0, 0.0],
"f": {"a": {"b": [1.0, -1.0]}}}

upstream_output = Literal(scalar=Scalar(generic=_json_format.Parse(json.dumps(dict_input), _struct.Struct())),
metadata={"format": "json"})

downstream_input = TypeEngine.to_python_value(FlyteContextManager.current_context(), upstream_output, dict)
assert dict_input == downstream_input
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@Future-Outlier
Copy link
Member Author

Future-Outlier commented Oct 3, 2024

This case can be supported, but we don't have backward compatible issue now, since no users report about this before.

# Status is an Enum class
@dataclass
class DC:
    grid: Dict[str, List[Optional[Union[int, str, float, bool, Status, InnerDC]]]] = field(default_factory=lambda: {
        'all_types': [InnerDC()],
    })

Mashumaro Issue: Fatal1ty/mashumaro#252

@eapolinario eapolinario merged commit fdf93da into master Oct 4, 2024
106 checks passed
otarabai pushed a commit to otarabai/flytekit that referenced this pull request Oct 15, 2024
* [flytekit][1][Simple Type] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* Add Tests

Signed-off-by: Future-Outlier <[email protected]>

* remove unused import

Signed-off-by: Future-Outlier <[email protected]>

* [flytekit][2][untyped dict] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* Fix Tests

Signed-off-by: Future-Outlier <[email protected]>

* [Flyte][3][Attribute Access] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* fix test_offloaded_literal

Signed-off-by: Future-Outlier <[email protected]>

* Add more tests

Signed-off-by: Future-Outlier <[email protected]>

* add tests for more complex cases

Signed-off-by: Future-Outlier <[email protected]>

* turn {} to dict()

Signed-off-by: Future-Outlier <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* fix all tests, and support flytetypes and union from binary idl

Signed-off-by: Future-Outlier <[email protected]>

* self._encoder: Dict[Type, JSONEncoder]

Signed-off-by: Future-Outlier <[email protected]>

* fix lint

Signed-off-by: Future-Outlier <[email protected]>

* better comments

Signed-off-by: Future-Outlier <[email protected]>

* support enum transformer

Signed-off-by: Future-Outlier <[email protected]>

* add test_flytefile_in_dataclass_wf

Signed-off-by: Future-Outlier <[email protected]>

* add tests

Signed-off-by: Future-Outlier <[email protected]>

* Test Backward Compatible

Signed-off-by: Future-Outlier <[email protected]>

* add type transformer failed error

Signed-off-by: Future-Outlier <[email protected]>

* Update pingsu's review advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>

* update pingsu's review advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>

* update dict and list test with dataclass

Signed-off-by: Future-Outlier <[email protected]>

* ruff

Signed-off-by: Future-Outlier <[email protected]>

* support Dict[int, int] as input in workflow, including attribute access

Signed-off-by: Future-Outlier <[email protected]>

* Trigger CI

Signed-off-by: Future-Outlier <[email protected]>

* Add flytekit.bin.entrypoint to __init__.py for auto copy bug

Signed-off-by: Future-Outlier <[email protected]>

* revert back

Signed-off-by: Future-Outlier <[email protected]>

* add tests for union in dataclass, nested case

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw <[email protected]>
kumare3 pushed a commit that referenced this pull request Nov 8, 2024
* [flytekit][1][Simple Type] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* Add Tests

Signed-off-by: Future-Outlier <[email protected]>

* remove unused import

Signed-off-by: Future-Outlier <[email protected]>

* [flytekit][2][untyped dict] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* Fix Tests

Signed-off-by: Future-Outlier <[email protected]>

* [Flyte][3][Attribute Access] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* fix test_offloaded_literal

Signed-off-by: Future-Outlier <[email protected]>

* Add more tests

Signed-off-by: Future-Outlier <[email protected]>

* add tests for more complex cases

Signed-off-by: Future-Outlier <[email protected]>

* turn {} to dict()

Signed-off-by: Future-Outlier <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* [flytekit][4][dataclass, flyte types and attribute access] Binary IDL With MessagePack

Signed-off-by: Future-Outlier <[email protected]>

* fix all tests, and support flytetypes and union from binary idl

Signed-off-by: Future-Outlier <[email protected]>

* self._encoder: Dict[Type, JSONEncoder]

Signed-off-by: Future-Outlier <[email protected]>

* fix lint

Signed-off-by: Future-Outlier <[email protected]>

* better comments

Signed-off-by: Future-Outlier <[email protected]>

* support enum transformer

Signed-off-by: Future-Outlier <[email protected]>

* add test_flytefile_in_dataclass_wf

Signed-off-by: Future-Outlier <[email protected]>

* add tests

Signed-off-by: Future-Outlier <[email protected]>

* Test Backward Compatible

Signed-off-by: Future-Outlier <[email protected]>

* add type transformer failed error

Signed-off-by: Future-Outlier <[email protected]>

* Update pingsu's review advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>

* update pingsu's review advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>

* update dict and list test with dataclass

Signed-off-by: Future-Outlier <[email protected]>

* ruff

Signed-off-by: Future-Outlier <[email protected]>

* support Dict[int, int] as input in workflow, including attribute access

Signed-off-by: Future-Outlier <[email protected]>

* Trigger CI

Signed-off-by: Future-Outlier <[email protected]>

* Add flytekit.bin.entrypoint to __init__.py for auto copy bug

Signed-off-by: Future-Outlier <[email protected]>

* revert back

Signed-off-by: Future-Outlier <[email protected]>

* add tests for union in dataclass, nested case

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants