-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Allow Nested Complex Data Types for Types.Schema #523
Comments
I totally agree this is a limitation (and possibly an oversight) in the FlyteIdl design. I think it's worth reevaluating the decision of separating the supported column types in schemas as a strict subset of the supported FlyteIdl Literal Types. It might be worth departing from the current structure of schemas and introduce a lighter weight type (e.g. Rows / Records / Columns) that carries metadata about its columns names and types where the types can be any LiteralType supported by FlyteIdl. We've briefly discussed that in the past but the use-cases weren't there to support the investment. I know we have discussed this internally before @DobsX and I thank you for articulating the problem and the request. Is it something you might be able to spare time to work on? I'll be happy to help guide the implementation and provide as much context as I can... As Presto, BigQuery and Hive gain more and more traction on Flyte, I know a lot of people will appreciate the flexibility (and not compromise type safety). Please let me know... |
@EngHabu I definitely want to work on this, along with some other Flyte related features. I should have some time in the next couple of months. But FYI, likely won't be able to start immediately. Regardless, thank you for the checking this request out and happy to hear that this is something we do want to support. |
this is at least partially implemented with flyteorg/flytekit#785 |
closing this issue in favor of structured datasets |
* fix broken links, content clean-up Signed-off-by: Samhita Alla <[email protected]> * nit Signed-off-by: Samhita Alla <[email protected]> * incorporate suggestions @cosmicBboy Signed-off-by: Samhita Alla <[email protected]> * isort Signed-off-by: Samhita Alla <[email protected]>
Motivation: Why do you think this is important?
Currently Flyte does not have support for complex and nested data structures for Types.Schema. There’s currently no native way to select columns with these types into supported Type currently and would have to resort to storing as one of the existing Types and dealing with it on the client side.
One work-around is to read/write with a generic Types.Schema() schema but it offers no type-checking and type handling is handled by the client-side.
Additionally, if you are using Flyte for an ETL workflow, you don’t have a way to write the results back to the DB in the same format.
Here is a related issue: #22
Goal: What should the final outcome look like, ideally?
Basically to support maps/arrays and to have these maps/arrays be also nested into maps/arrays. Furthermore, if you have a
Types.Schema
data object, you can then store it on S3 and utilize the various functions/plugins to load your data into a DB as it's native type.Describe alternatives you've considered
Current alternative is to serialize to a JSON string, and then deserialize when you consume the data.
Flyte component
[Optional] Propose: Link/Inline
N/A
Additional context
Hive Types:
Presto Types:
BigQuery Types:
The current list of Flyte types are:
Here is an example in Presto:
Presto SQL:
Here is what the schema would look like based on the sql:
And here's what it would like in JSON:
Is this a blocker for you to adopt Flyte
Currently no, but nested structures with arrays and maps are becoming more popular in terms of usage.
The text was updated successfully, but these errors were encountered: