Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] MessagePack IDL, Pydantic Support, and Attribute Access #6022

Merged
merged 20 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions docs/user_guide/data_types_and_io/accessing_attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho
Consequently, accessing attributes in this manner is, in fact, a specially implemented feature.
This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures.

```{important}
Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```
Expand All @@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 1-10
:lines: 1-9
```

## List
Expand All @@ -31,38 +35,38 @@ Flyte currently does not support output promise access through list slicing.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 14-23
:lines: 13-22
```

## Dictionary
Access the output dictionary by specifying the key.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 27-35
:lines: 26-34
```

## Data class
Directly access an attribute of a dataclass.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 39-53
:lines: 38-51
```

## Complex type
Combinations of list, dict and dataclass also work effectively.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 57-80
:lines: 55-78
```

You can run all the workflows locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 84-88
:lines: 82-86
```

## Failure scenario
Expand Down
13 changes: 12 additions & 1 deletion docs/user_guide/data_types_and_io/dataclass.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,18 @@ Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro)
to serialize and deserialize dataclasses.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

:::{important}
If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
:::

:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
Flytekit version < v1.14.0 will produce protobuf struct literal for dataclasses.

Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.

If you're using Flytekit version >= v1.14.0 and you want to produce protobuf struct literal for dataclasses, you can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention why would a user want to produce protobuf struct literal instead of msgpack bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flytekit version < v1.14.0 will produce protobuf struct literal for dataclasses.
Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.
If you're using Flytekit version >= v1.14.0 and you want to produce protobuf struct literal for dataclasses, you can

set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

This was already mentioned above

Also in the readthedocs build, you can see there are two important blocks nested


For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
Expand Down
3 changes: 2 additions & 1 deletion docs/user_guide/data_types_and_io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Here's a breakdown of these mappings:
- Use ``pyspark.DataFrame`` as a type hint.
* - ``pydantic.BaseModel``
- ``Map``
- To utilize the type, install the ``flytekitplugins-pydantic`` plugin.
- To utilize the type, install the ``pydantic>2`` module.
- Use ``pydantic.BaseModel`` as a type hint.
* - ``torch.Tensor`` / ``torch.nn.Module``
- File
Expand Down Expand Up @@ -144,6 +144,7 @@ flytefile
flytedirectory
structureddataset
dataclass
pydantic_basemodel
accessing_attributes
pytorch_type
enum_type
Expand Down
97 changes: 97 additions & 0 deletions docs/user_guide/data_types_and_io/pydantic_basemodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
(pydantic_basemodel)=

# Pydantic BaseModel

```{eval-rst}
.. tags:: Basic
```

When you have multiple values that you want to send across Flyte entities, and you want them to have, you can use a `pydantic.BaseModel`.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

:::{important}
Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0.
:::

:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
If you're using Flytekit version >= v1.14.0 and you want to produce protobuf struct literal for Pydantic BaseModels,
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

To begin, import the necessary dependencies:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 1-9
```

Build your custom image with ImageSpec:
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 11-14
```

## Python types
We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: Datum
```

You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON.

:::{note}
All variables in a data class should be **annotated with their type**. Failure to do should will result in an error.
:::

Once declared, a dataclass can be returned as an output or accepted as an input.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 26-41
```

## Flyte types
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 45-86
```

A data class supports the usage of data associated with Python types, data classes,
flyte file, flyte directory and structured dataset.

We define a workflow that calls the tasks created above.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: basemodel_wf
```

You can run the workflow locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 99-100
```

To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \
basemodel_wf --x 1 --y 2
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/
Loading