feat: implement basic struct handling #91

EpsilonPrime · 2024-10-08T04:15:48Z

Adds the capability to create structs one level deep using the Spark struct() data frame API.

To assist with this functionality the field handling data structure has been upgraded from a
string reference into a full type (Field) allowing for tracking of additional names today and
precise type tracking in the future.

Future PRs will add arbitrary type adding and will make getField() work on structures.

github-actions · 2024-10-08T07:08:36Z

ACTION NEEDED

Substrait follows the Conventional Commits
specification for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

EpsilonPrime · 2024-10-08T07:50:09Z

@mbrobbel Could you take a look at this PR please?

src/backends/arrow_tools.py

src/backends/tests/arrow_tools_test.py

mbrobbel · 2024-10-08T14:38:57Z

src/gateway/converter/spark_to_substrait.py

+        if not current_symbol:
+            raise InternalError(
+                f'Could not find plan id {self._current_plan_id} constructed earlier.')


Should we move this lookup code to a separate method? Because it's used in many places.

I tried a few approaches here. All of this is added just to make pyright happy so I could just remove all of the checks and the code would run fine. I also considered moving the check into the code that does the lookup but there are a few cases where None is appropriate. I can remove these for now and can add back a better solution when a pyright github action is added.

In that case I would be in favor of keeping them until we can improve.
Edit: looks like you've already removed them, maybe add a task to track adding these checks again?

I did add an issue for adding pyright, that would require fixing all of the issues (about 100) mostly involving type|None mismatches.

src/gateway/converter/spark_to_substrait.py

src/gateway/converter/symbol_table.py

EpsilonPrime · 2024-10-09T14:27:28Z

Thanks for the review @mbrobbel ! Could you merge it please? Thanks!

EpsilonPrime added 11 commits September 20, 2024 15:56

added test for struct/getfield

b2e1ce2

start implementing struct

f27d98d

undo server changes

2494a49

support numeric info

025004c

investigating struct behavior

388b61d

progress

74aa645

update installation instructions

e94b107

working on arrow tools package

abbd6d8

Now properly renames structs.

c4defda

remove the root names hack

80e9124

massive type refactor in progress

593638e

EpsilonPrime marked this pull request as draft October 8, 2024 04:15

EpsilonPrime added 5 commits October 7, 2024 22:44

another bug fix

790c677

fixed inline SQL output fields

f866422

disabled getfield tests as we need to track names with types

c915d83

ruff

9932c06

remove self typing

ce186ee

EpsilonPrime marked this pull request as ready for review October 8, 2024 07:11

EpsilonPrime mentioned this pull request Oct 8, 2024

Support Spark's "struct" function #67

Open

mbrobbel reviewed Oct 8, 2024

View reviewed changes

EpsilonPrime added 2 commits October 8, 2024 21:07

changes from review

bb49c19

removed checks that made pyright less unhappy

92652f7

EpsilonPrime requested a review from mbrobbel October 9, 2024 06:20

mbrobbel approved these changes Oct 9, 2024

View reviewed changes

mbrobbel merged commit 23e2fd7 into voltrondata:main Oct 9, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement basic struct handling #91

feat: implement basic struct handling #91

EpsilonPrime commented Oct 8, 2024 •

edited

Loading

github-actions bot commented Oct 8, 2024

EpsilonPrime commented Oct 8, 2024

mbrobbel Oct 8, 2024

EpsilonPrime Oct 9, 2024

mbrobbel Oct 9, 2024 •

edited

Loading

EpsilonPrime Oct 9, 2024

EpsilonPrime commented Oct 9, 2024

feat: implement basic struct handling #91

feat: implement basic struct handling #91

Conversation

EpsilonPrime commented Oct 8, 2024 • edited Loading

github-actions bot commented Oct 8, 2024

EpsilonPrime commented Oct 8, 2024

mbrobbel Oct 8, 2024

Choose a reason for hiding this comment

EpsilonPrime Oct 9, 2024

Choose a reason for hiding this comment

mbrobbel Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

EpsilonPrime Oct 9, 2024

Choose a reason for hiding this comment

EpsilonPrime commented Oct 9, 2024

EpsilonPrime commented Oct 8, 2024 •

edited

Loading

mbrobbel Oct 9, 2024 •

edited

Loading