Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary IDL Attribute Access for Map Task #5942

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Oct 30, 2024

Tracking Issue

Flyte Issue #5318

Why are the changes needed?

This update ensures that the following example workflow functions as expected:

@dataclass
class MyDataClass:
    my_ints: list[int]

@flytekit.task
def print_int(my_int: int):
    print(f"my_int: {my_int}")

@flytekit.workflow
def my_workflow(my_data_class: MyDataClass = MyDataClass(my_ints=[1, 2, 5, 10])):
    flytekit.map_task(print_int)(my_int=my_data_class.my_ints)

What changes are proposed in this pull request?

The primary change is to ensure that a collection literal is created when the resolved value is a list, allowing map tasks to function properly with collections.

Updated Workflow Lifecycle

Non-Map Task (List handling):

  1. Before:
    Python value -> binary IDL -> resolved value -> binary IDL -> list transformer -> Python value
  2. After:
    Python value -> binary IDL -> resolved value -> binary IDL in literal collection -> list transformer -> expected Python type transformer -> Python value

Map Task:

  1. Before:
    The array node handler didn’t receive a collection literal[1], so it couldn’t determine the length of the literal value.
  2. After:
    Python value -> binary IDL -> resolved value -> binary IDL in literal collection -> ArrayNodeMapTask (Flytekit) -> expected Python type transformer -> Python value

References

Testing

This patch has been verified using unit tests and remote execution in Flytekit.

Screenshots

Image 1 Image 2

Checklist

  • Documentation has been updated where necessary.
  • All new and existing tests pass.
  • All commits are signed-off.

case *core.Literal_Scalar:
default:
break
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the right logic, we should stop the iteration if this is not a map, collection, or offloaded literal.

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.

Project coverage is 36.84%. Comparing base (01c3519) to head (856e606).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...opeller/pkg/controller/nodes/attr_path_resolver.go 82.35% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5942   +/-   ##
=======================================
  Coverage   36.84%   36.84%           
=======================================
  Files        1309     1309           
  Lines      130979   130994   +15     
=======================================
+ Hits        48256    48268   +12     
- Misses      78534    78536    +2     
- Partials     4189     4190    +1     
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin 54.10% <ø> (ø)
unittests-flytecopilot 11.73% <ø> (ø)
unittests-flytectl 62.40% <ø> (ø)
unittests-flyteidl 6.92% <ø> (ø)
unittests-flyteplugins 53.64% <ø> (ø)
unittests-flytepropeller 43.02% <82.35%> (+0.01%) ⬆️
unittests-flytestdlib 55.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier
Copy link
Member Author

Future-Outlier commented Oct 30, 2024

In this PR (flyteorg/flyte#5942), I’m considering whether the propeller might crash if the resolved value is a collection containing over 1000 dataclass instances. The current approach would serialize it as a binary collection and pass it to the Flytekit array node task:

Current Approach

[1000+ dataclass instances] -> serialized as binary collection -> Flytekit array node task

Reference Code


Alternative Approach

To reduce potential memory pressure, we could consider an alternative that serializes the collection directly as binary, then processes it as a Python value in Flytekit:

[1000+ dataclass instances] -> serialize directly as binary -> Flytekit Python value -> process using these steps

Alternative Code flytepropeller
Alternative Code flytekit

This alternative may reduce the load on the propeller by avoiding the need to handle a large nested structure directly, potentially improving stability and performance in cases with large collections.
but we will need to deserialize lots of dataclass (waste resource)

My Thoughts

I prefer the first one, which is this PR, since Byron's attribute access did similar things before.
(the recursion function can be seen as a for loop)

func convertInterfaceToLiteral(nodeID string, obj interface{}) (*core.Literal, error) {
literal := &core.Literal{}
switch obj := obj.(type) {
case map[string]interface{}:
newSt, err := structpb.NewStruct(obj)
if err != nil {
return nil, err
}
literal.Value = &core.Literal_Scalar{
Scalar: &core.Scalar{
Value: &core.Scalar_Generic{
Generic: newSt,
},
},
}
case []interface{}:
literals := []*core.Literal{}
for _, v := range obj {
// recursively convert the interface to literal
literal, err := convertInterfaceToLiteral(nodeID, v)
if err != nil {
return nil, err
}
literals = append(literals, literal)
}
literal.Value = &core.Literal_Collection{
Collection: &core.LiteralCollection{
Literals: literals,
},
}
case interface{}:
scalar, err := convertInterfaceToLiteralScalar(nodeID, obj)
if err != nil {
return nil, err
}
literal.Value = scalar
}
return literal, nil
}

According to the performance testing by the msgpack's benchmark
It looks like it didn't use a lot of CPU resource.
image

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is correct. I'm a little worried about unions, but i think we should revisit that after the other issues are sorted if there is an issue there.

@wild-endeavor wild-endeavor merged commit e0da46b into master Oct 30, 2024
51 checks passed
@wild-endeavor wild-endeavor deleted the fix-map-task-with-msgpack-attr-access-v2-by-attr-resolving branch October 30, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants