Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing LocalLimitExec with no column should not return an Err #5709

Merged
merged 3 commits into from
Mar 24, 2023

Conversation

kazuyukitanimura
Copy link
Contributor

@kazuyukitanimura kazuyukitanimura commented Mar 23, 2023

Which issue does this PR close?

Closes #5701

Rationale for this change

Bug fix

What changes are included in this PR?

Use options to explicitly set row_count, closely related / similar solution to #4912

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label Mar 23, 2023
@@ -462,7 +462,8 @@ impl LimitStream {
.iter()
.map(|col| col.slice(0, col.len().min(batch_rows)))
.collect();
Some(RecordBatch::try_new(batch.schema(), limited_columns).unwrap())
let options = RecordBatchOptions::new().with_row_count(Option::from(batch_rows));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kazuyukitanimura
extending call RecordBatch to be called with options gives more flexibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is nice

@kazuyukitanimura
Copy link
Contributor Author

cc @viirya @sunchao

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @kazuyukitanimura (and the assist @comphead )

Would it be possible to make an end to end reproducer (either with SQL or with the DataFrame API) to show the problem? It would be nice to ensure there aren't other issues with creating zero column batches.

I tried to reproduce the error and I was not able to:

DataFusion CLI v20.0.0
❯ create table foo(x int);
0 rows in set. Query took 0.004 seconds.
❯ insert into foo values(1);
0 rows in set. Query took 0.003 seconds.
❯ insert into foo values(2);
0 rows in set. Query took 0.002 seconds.
❯ explain select 1 from foo limit 1;
+---------------+---------------------------------------------------+
| plan_type     | plan                                              |
+---------------+---------------------------------------------------+
| logical_plan  | Projection: Int64(1)                              |
|               |   Limit: skip=0, fetch=1                          |
|               |     TableScan: foo projection=[x], fetch=1        |
| physical_plan | ProjectionExec: expr=[1 as Int64(1)]              |
|               |   GlobalLimitExec: skip=0, fetch=1                |
|               |     MemoryExec: partitions=1, partition_sizes=[2] |
|               |                                                   |
+---------------+---------------------------------------------------+
2 rows in set. Query took 0.004 seconds.
❯ select 1 from foo limit 1;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+

@@ -462,7 +462,8 @@ impl LimitStream {
.iter()
.map(|col| col.slice(0, col.len().min(batch_rows)))
.collect();
Some(RecordBatch::try_new(batch.schema(), limited_columns).unwrap())
let options = RecordBatchOptions::new().with_row_count(Option::from(batch_rows));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is nice

@kazuyukitanimura
Copy link
Contributor Author

Thank you for the contribution @kazuyukitanimura (and the assist @comphead )

Would it be possible to make an end to end reproducer (either with SQL or with the DataFrame API) to show the problem? It would be nice to ensure there aren't other issues with creating zero column batches.

Thanks @alamb that will have to be a separate work from this PR. I have some examples, but I need to think through how to remove some proprietary code from that. Additionally I suspect there will be other execs that have the same issue, so I would like make the reproducer generic. Would you mind approving this PR first given this is pretty much the same as #4912
I will file a ticket for the end. to end reproducer.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real value in an end to end reproducer is so that we don't accidentally break the usecase in some other later refactor

@viirya
Copy link
Member

viirya commented Mar 23, 2023

To have end to end reproducer, you need to be able to project empty-column relation (with non-empty row), not sure if DataFusion allows such thing in SQL/DataFrame API.

@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented Mar 24, 2023

Thank you @alamb @comphead @sunchao @viirya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Executing LocalLimitExec with no column should not return an Err
5 participants