Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core Feature] Support for enum types used in dataclasses #1514

Closed
palchicz opened this issue Sep 22, 2021 · 8 comments
Closed

[Core Feature] Support for enum types used in dataclasses #1514

palchicz opened this issue Sep 22, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@palchicz
Copy link

Motivation: Why do you think this is important?
Since dataclasses are a supported type and Enums are a supported type, one would expect that dataclasses that reference a enum should also be supported. Instead, the protobuf fails to form properly and I get the following warning from the command

pyflyte --pkgs <PACKAGE_NAME> serialize --image <IMAGE_NAME> workflows

WARNING:flytekit:failed to extract schema for object <class 'xxx.yyy.zzz'>, (will run schemaless) error: Currently do not support JSON schema for enums loaded by value

Goal: What should the final outcome look like, ideally?
I should be able to define a dataclass and enum like the following:

class Animal(Enum):
    DOG = "dog"
    CAT = "cat"

@dataclass_json
@dataclass
class Person(object):
    pet: Animal = Animal.DOG

Describe alternatives you've considered
I've considered implementing a new custom type.

[Optional] Propose: Link/Inline OR Additional context
This might have its root cause in this dependency https://github.com/fuhrysteve/marshmallow-jsonschema/blob/master/marshmallow_jsonschema/base.py#L228

@palchicz palchicz added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Sep 22, 2021
@welcome
Copy link

welcome bot commented Sep 22, 2021

Thank you for opening your first issue here! 🛠

@kumare3
Copy link
Contributor

kumare3 commented Sep 22, 2021

@palchicz this is because of the underlying dataclass-json library. We are actually working on supporting #1362 Pickle types. But, pickle is not native to other languages and wont show up in the UI.
Whats the usecase?

@palchicz
Copy link
Author

Thanks for the quick response! We have a collection of ~10 workflow inputs that are shared among several workflows. These inputs are a collection of closely related parameters, and they always show up together across different workflows, so it makes sense to group them together. Consolidating them into a single dataclass made sense, however, one of these shared inputs is an enum and thus we run into the problem stated above.

For the time being, we can still use the dataclass to group 9/10 of the parameters together so we don't have to repeatedly and explicitly lay them out for each workflow. It's just awkward to have the one odd parameter that we do need to add to the workflow definition explicitly just because it's an enum.

Here are some code example to help clarify what I said above:

class EnumParam(Enum):
    OPTION_1 = "option 1"
    OPTION_2 = "option 2"

# Naive approach
@workflow
def workflow_1(param_1: int, param_2: int, ...., param_n: EnumParam) -> int:
   ...

@workflow
def workflow_2(param_1: int, param_2: int, ...., param_n: EnumParam) -> str:
   ...

# Envisioned Approach
@dataclass_json
@dataclass
class WorkflowParameters(object):
    param_1: int = 0
    param_2: int = 0
    ...
    param_n: EnumParam = EnumParam.OPTION_1

@workflow
def workflow_1(workflow_params: WorkflowParameters) -> int:
   ...

@workflow
def workflow_2(workflow_params: WorkflowParameters) -> str:
   ...

# Workaround 
@dataclass_json
@dataclass
class WorkflowParameters(object):
    param_1: int = 0
    param_2: int = 0
    ...


@workflow
def workflow_1(workflow_params: WorkflowParameters, param_n: EnumParam) -> int:
   ...

@workflow
def workflow_2(workflow_params: WorkflowParameters, param_n: EnumParam) -> str:
   ...

@kumare3
Copy link
Contributor

kumare3 commented Sep 23, 2021

@palchicz appreciate giving more clarification. your request absolutely makes sense. One follow up question, do you use the Enum Form in the UI - as shown below?

Screen Shot 2021-09-23 at 4 32 48 PM

This UI feature will not work with dataclasses with nested enums, that is because dataclass_json does not support enum schema extraction AFAICT. But JSON actually supports that. Let us try that.

In anycase enums should be supported by dataclasses, and I think we just need to enable one more library for that?
marshmallow_enum - https://github.com/fuhrysteve/marshmallow-jsonschema/blob/master/marshmallow_jsonschema/base.py#L22-L27.

Can you try installing the library and help us debug, and please jump into our slack channel - here and we can work on it with you.

@palchicz
Copy link
Author

palchicz commented Sep 25, 2021

One follow up question, do you use the Enum Form in the UI - as shown below?

That would be ideal, but not a deal breaker. At the very least there isn't any urgency, so maybe the dataclass_json people will add that feature before too long 😃 🤞

I can certainly jump in and help. I should have some time next week. I'll join the slack channel like you suggested.

@kumare3
Copy link
Contributor

kumare3 commented Sep 25, 2021

One follow up question, do you use the Enum Form in the UI - as shown below?

That would be ideal, but not a deal breaker. At the very least there isn't any urgency, so maybe the dataclass_json people will add that feature before too long 😃 🤞

I can certainly jump in and help. I should have some time next week. I'll join the slack channel like you suggested.

Awesome, see you soon. Also, if the form is not required- enums should be supported. I think the enum marshmallow library might work

@palchicz
Copy link
Author

palchicz commented Oct 4, 2021

In anycase enums should be supported by dataclasses, and I think we just need to enable one more library for that?
marshmallow_enum - https://github.com/fuhrysteve/marshmallow-jsonschema/blob/master/marshmallow_jsonschema/base.py#L22-L27.

Can you try installing the library and help us debug

I took a stab at debugging this problem and noticed that marshmallow_enum is already installed. The issue is this line here in the mashmallow_jsonschema library which raises an exception when trying to convert the marshmallow enum field into a json schema enum. If you trace the code, it wouldn't get to this point unless marshmallow_enum was installed.

The exact reason it raises an exception is because the marshmallow Enum field has the property field.load_by == LoadDumpOptions.value (which gets set here). The matshmallow_json libraby is probably a little too restrictive in that it just errors out whenever it encounters this case. It should probably allow strings even when the field was loaded by value.

Anyways, that's as far as I got. No solution to propose, but that at least seems to be the issue.

@kumare3
Copy link
Contributor

kumare3 commented Oct 8, 2021

We are working on adding more type support into Dataclass, we will take a look at this as part of the work - cc @pingsutw

@kumare3 kumare3 removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 8, 2021
@kumare3 kumare3 added this to the 0.18.1 milestone Oct 8, 2021
@EngHabu EngHabu modified the milestones: 0.18.1, 0.18.2 Nov 3, 2021
@pingsutw pingsutw self-assigned this Nov 14, 2021
@EngHabu EngHabu modified the milestones: 0.18.2, 0.19.1 - Jan 2021 Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants