Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make an end to end reproducer for zero column batch issues #5713

Closed
kazuyukitanimura opened this issue Mar 24, 2023 · 3 comments · Fixed by #11624
Closed

Make an end to end reproducer for zero column batch issues #5713

kazuyukitanimura opened this issue Mar 24, 2023 · 3 comments · Fixed by #11624
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Mar 24, 2023

Is your feature request related to a problem or challenge?

Based on the review #5709 (review) it is ideal to make an end to end reproducer for zero column batch issues (either with SQL or with the DataFrame API) It would be nice to ensure there aren't other issues with creating zero column batches.

We have found two execs so far that return an Err ArrowError(InvalidArgumentError("must either specify a row count or at least one column")) with no column #4911 #5701

I suspect there are other exec may have the same issue.

Describe the solution you'd like

Need to find a good method to project empty-column relation (with non-empty row),

Describe alternatives you've considered

No response

Additional context

No response

@kazuyukitanimura kazuyukitanimura added the enhancement New feature or request label Mar 24, 2023
@Jefffrey
Copy link
Contributor

I think excluding columns is a valid way to get zero column batches with non-empty row count: #6510 (comment)

#7945 added a test to sqllogictests which could be considered end to end, however it is very simple involving only a projection.

Is there a list of things to test for zero-column batches? I see limit and projection are the only ones mentioned in the original issue text.

@alamb
Copy link
Contributor

alamb commented Oct 29, 2023

One classic list of things to test is Filter, Projection, Grouping, Join, and Window, and Limit. I am not sure how far we should go with this

BTW is one potential "end to end" test for zero column batches and LIMIT (maybe someone could just put this into slt file, following the model of @Jefffrey in #7945)

Adding some simple variations with GROUP BY and JOIN could also help

❯ create table t as values (1, 2), (3, 4);
0 rows in set. Query took 0.002 seconds.

❯ select * except (column1, column2) from t ;
++
++
++
2 rows in set. Query took 0.001 seconds.

❯ select * except (column1, column2) from t limit 1;
++
++
++
1 row in set. Query took 0.000 seconds.

@alamb alamb added the good first issue Good for newcomers label Oct 29, 2023
@alamb
Copy link
Contributor

alamb commented Oct 29, 2023

I think this is a good first issue for someone with some familiarity with SQL and who wants to learn about how the sqllogictests work -- see https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest#readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants