From e13babb75dd9b327037447b8bcc55182c94f6ba0 Mon Sep 17 00:00:00 2001 From: "Han-Ru Chen (Future-Outlier)" Date: Fri, 22 Nov 2024 01:13:42 +0800 Subject: [PATCH] [Docs] MessagePack IDL, Pydantic Support, and Attribute Access (#6022) * [Docs] MessagePack IDL, Pydantic Support and Attribute Access Signed-off-by: Future-Outlier * support Signed-off-by: Future-Outlier * update Signed-off-by: Future-Outlier * lint Signed-off-by: Future-Outlier * Trigger CI Signed-off-by: Future-Outlier * Trigger CI Signed-off-by: Future-Outlier * lint Signed-off-by: Future-Outlier * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * nit Signed-off-by: Future-Outlier * nit Signed-off-by: Future-Outlier * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/dataclass.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * Update docs/user_guide/data_types_and_io/pydantic_basemodel.md Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> Signed-off-by: Han-Ru Chen (Future-Outlier) * format Signed-off-by: Future-Outlier --------- Signed-off-by: Future-Outlier Signed-off-by: Han-Ru Chen (Future-Outlier) Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com> --- .../data_types_and_io/accessing_attributes.md | 16 ++- .../user_guide/data_types_and_io/dataclass.md | 18 ++- docs/user_guide/data_types_and_io/index.md | 3 +- .../data_types_and_io/pydantic_basemodel.md | 103 ++++++++++++++++++ 4 files changed, 132 insertions(+), 8 deletions(-) create mode 100644 docs/user_guide/data_types_and_io/pydantic_basemodel.md diff --git a/docs/user_guide/data_types_and_io/accessing_attributes.md b/docs/user_guide/data_types_and_io/accessing_attributes.md index f2783afacf..82b2345ad5 100644 --- a/docs/user_guide/data_types_and_io/accessing_attributes.md +++ b/docs/user_guide/data_types_and_io/accessing_attributes.md @@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho Consequently, accessing attributes in this manner is, in fact, a specially implemented feature. This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures. +```{important} +Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well. +``` + ```{note} To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` @@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 1-10 +:lines: 1-9 ``` ## List @@ -31,7 +35,7 @@ Flyte currently does not support output promise access through list slicing. ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 14-23 +:lines: 13-22 ``` ## Dictionary @@ -39,7 +43,7 @@ Access the output dictionary by specifying the key. ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 27-35 +:lines: 26-34 ``` ## Data class @@ -47,7 +51,7 @@ Directly access an attribute of a dataclass. ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 39-53 +:lines: 38-51 ``` ## Complex type @@ -55,14 +59,14 @@ Combinations of list, dict and dataclass also work effectively. ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 57-80 +:lines: 55-78 ``` You can run all the workflows locally as follows: ```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py :caption: data_types_and_io/attribute_access.py -:lines: 84-88 +:lines: 82-86 ``` ## Failure scenario diff --git a/docs/user_guide/data_types_and_io/dataclass.md b/docs/user_guide/data_types_and_io/dataclass.md index 926c9d35b4..462ba7da3a 100644 --- a/docs/user_guide/data_types_and_io/dataclass.md +++ b/docs/user_guide/data_types_and_io/dataclass.md @@ -11,8 +11,24 @@ When you've multiple values that you want to send across Flyte entities, you can Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro) to serialize and deserialize dataclasses. +With the 1.14 release, `flytekit` adopted `MessagePack` as the +serialization format for dataclasses, overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype, like the previous versions do: + +to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue. + +:::{important} +If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`. +::: + :::{important} -If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`. +Flytekit version < v1.14.0 will produce protobuf `struct` literal for dataclasses. + +Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses. + +If you're using Flytekit version >= v1.14.0 and you want to produce protobuf `struct` literal for dataclasses, you can +set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`. + +For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md ::: ```{note} diff --git a/docs/user_guide/data_types_and_io/index.md b/docs/user_guide/data_types_and_io/index.md index 3280054696..c554b08acd 100644 --- a/docs/user_guide/data_types_and_io/index.md +++ b/docs/user_guide/data_types_and_io/index.md @@ -114,7 +114,7 @@ Here's a breakdown of these mappings: - Use ``pyspark.DataFrame`` as a type hint. * - ``pydantic.BaseModel`` - ``Map`` - - To utilize the type, install the ``flytekitplugins-pydantic`` plugin. + - To utilize the type, install the ``pydantic>2`` module. - Use ``pydantic.BaseModel`` as a type hint. * - ``torch.Tensor`` / ``torch.nn.Module`` - File @@ -144,6 +144,7 @@ flytefile flytedirectory structureddataset dataclass +pydantic_basemodel accessing_attributes pytorch_type enum_type diff --git a/docs/user_guide/data_types_and_io/pydantic_basemodel.md b/docs/user_guide/data_types_and_io/pydantic_basemodel.md new file mode 100644 index 0000000000..be40672534 --- /dev/null +++ b/docs/user_guide/data_types_and_io/pydantic_basemodel.md @@ -0,0 +1,103 @@ +(pydantic_basemodel)= + +# Pydantic BaseModel + +```{eval-rst} +.. tags:: Basic +``` + +`flytekit` version >=1.14 supports natively the `JSON` format that Pydantic `BaseModel` produces, enhancing the +interoperability of Pydantic BaseModels with the Flyte type system. + +:::{important} +Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0. +::: + +With the 1.14 release, `flytekit` adopted `MessagePack` as the serialization format for Pydantic `BaseModel`, +overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype like the previous versions do: + +to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue. + +:::{important} +By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing, preserving the types defined in your `BaseModel` class. +If you're serializing `BaseModel` using `flytekit` version >= v1.14.0 and you want to produce Protobuf `struct` literal instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`. + +For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md +::: + +```{note} +You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel. +``` + +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + +To begin, import the necessary dependencies: + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:lines: 1-9 +``` + +Build your custom image with ImageSpec: +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:lines: 11-14 +``` + +## Python types +We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types. + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:pyobject: Datum +``` + +You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON. + +:::{note} +All variables in a data class should be **annotated with their type**. Failure to do should will result in an error. +::: + +Once declared, a dataclass can be returned as an output or accepted as an input. + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:lines: 26-41 +``` + +## Flyte types +We also define a data class that accepts {std:ref}`StructuredDataset `, +{std:ref}`FlyteFile ` and {std:ref}`FlyteDirectory `. + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:lines: 45-86 +``` + +A data class supports the usage of data associated with Python types, data classes, +flyte file, flyte directory and structured dataset. + +We define a workflow that calls the tasks created above. + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:pyobject: basemodel_wf +``` + +You can run the workflow locally as follows: + +```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py +:caption: data_types_and_io/pydantic_basemodel.py +:lines: 99-100 +``` + +To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input: +``` +pyflyte run \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \ + basemodel_wf --x 1 --y 2 +``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/