Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

[flytepropeller] Support attribute access on promises #615

Closed
wants to merge 6 commits into from

Conversation

ByronHsu
Copy link
Contributor

@ByronHsu ByronHsu commented Sep 11, 2023

Signed-off-by: byhsu <[email protected]>
@@ -125,6 +125,12 @@ func validateBinding(w c.WorkflowBuilder, nodeID c.NodeID, nodeParam string, bin
}
}

// Skip the validation if the promise has attribute paths
// because we don't know the type of the resolved attribute
if len(val.Promise.AttrPath) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to know the attribute type if the promise type is list or dict, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the list contains nested union type?

For example, in type Dict[str, Union[str, Union[str, int]]], we pass x["a"], but we don't know if "a" mapped to str or Union[str, int]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use the AreTypesCastable function here right? It should handle Union as either a source or destination type. IMO this feature needs type validation, it's something Flyte is very opinionated about and seems very sloppy to leave out.

pkg/controller/nodes/attr_path_resolver.go Outdated Show resolved Hide resolved
Signed-off-by: byhsu <[email protected]>
Signed-off-by: byhsu <[email protected]>
Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks close to me. Great work - I know there are a TON of users interested in this feature!

pkg/controller/nodes/attr_path_resolver.go Show resolved Hide resolved
pkg/controller/nodes/attr_path_resolver.go Show resolved Hide resolved
pkg/controller/nodes/output_resolver.go Show resolved Hide resolved
@@ -125,6 +125,26 @@ func validateBinding(w c.WorkflowBuilder, nodeID c.NodeID, nodeParam string, bin
}
}

// If the type is a struct (e.g. dataclass) and the attribute path is longer than 0,
// We skip the type check and let it fail at runtime because we don't know the type of struct field
if sourceType.GetSimple() == flyte.SimpleType_STRUCT && len(val.Promise.AttrPath) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wild-endeavor is this necessarily true. if we have dataclass_json do we know the type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ByronHsu would you mind just trying to fill this in a bit? If it's really not doable, or if it has a bunch of edge cases, I think it's okay.

Walking through some examples. Let's say the simple case of

@task
def t1() -> List[str]:
    ...

@task
def t2(needs_str: str):
    ...

@workflow
def wf():
    res = t1()
    t2(needs_str=res[5])

In this case t1 will have a TypedInterface output of

        value {
          type {
            collection_type {
              simple: STRING
            }
          }
          description: "o0"
        }

and because it's a string it will type check against the string in t2. The same can be applied for nested lists and dictionaries. As long as they are expressed via collection_type and map_value_type they are relatively easily type-checkable. I assume this is what the code below is doing?

The dataclass case is more complicated

@dataclass_json
@dataclass
class MyDC(object):
    snapshotDate: datetime
    region: str

@task
def t1(needs_dt: datetime):
    ...

@task
def t2(needs_str: str):
    ...

@workflow
def wf(a: MyDC):
    t1(needs_dt=a.snapshotDate)
    t2(needs_str=a.region)

The reason it's more complicated is because the dataclass types are completely obscured (esp. since flyte idl currently doesn't support multi-variate map types). I assume this is why you're skipping checking in the simple/struct case.

Could you see if it's possible though to capture it? Can we

  • Add a new field to LiteralType that is only relevant for the scalar case. Maybe just in LiteralType or in TypeStructure.
  • Add a nested literal map of the types that's only present in the dataclass case. Just iterate through the fields in dataclass and recursively call the TypeEngine.
  • Add the same checking logic in propeller as the normal map_value_type if it's a simple_struct and this new field is present.

What do you think? It will add to the correctness of this new feature. And it will make Dan happy. And in the end, that's what we're all really about.

go.mod Outdated
@@ -146,3 +146,4 @@ require (
)

replace github.com/aws/amazon-sagemaker-operator-for-k8s => github.com/aws/amazon-sagemaker-operator-for-k8s v1.0.1-0.20210303003444-0fb33b1fd49d
replace github.com/flyteorg/flyteidl => ../flyteidl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ByronHsu can you update this?

@eapolinario
Copy link
Contributor

Hi, we are moving all Flyte development to a monorepo. In order to help the transition period, we're moving open PRs to monorepo automatically and your PR was moved to flyteorg/flyte#4150. Notice that if there are any conflicts in the resulting PR they most likely happen due to the change in the import path of the flyte components.

@eapolinario eapolinario closed this Oct 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants