Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListingTable cannot handle partition evolution #13270

Open
adriangb opened this issue Nov 6, 2024 · 2 comments
Open

ListingTable cannot handle partition evolution #13270

adriangb opened this issue Nov 6, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@adriangb
Copy link
Contributor

adriangb commented Nov 6, 2024

Describe the bug

With CSV:

echo "a,b\n1,2" > data1.csv
mkdir a=2
echo "b\n3" > a=2/data2.csv
datafusion-cli
> SELECT * FROM '**/*.csv';
Arrow error: Csv error: incorrect number of fields for line 1, expected 2 got 1

With Parquet:

import os
import polars as pl

pl.DataFrame({'a': [1], 'b': [2]}).write_parquet('data1.parquet')
os.mkdir('a=2')
pl.DataFrame({'b': [3]}).write_parquet('a=2/data2.parquet')
datafusion-cli
> SELECT * FROM '**/*.parquet';
+---+---+
| b | a |
+---+---+
| 2 | 1 |
| 3 |   |
+---+---+
2 row(s) fetched.
Elapsed 0.055 seconds.

To Reproduce

No response

Expected behavior

Partition evolution is handled and both cases return

+---+---+
| b | a |
+---+---+
| 2 | 1 |
| 3 | 2 |
+---+---+

Additional context

Having played around quite a bit with ParquetExec and the SchemaAdapter machinery I think what should happen is:

  • Partition values are on a per-file basis, in particular on each PartitionedFile and not on the FileScanConfig
  • Partition values are passed into the SchemaAdapter machinery and for each file it decides if it needs to add a column generated from partition values or not
@adriangb adriangb added the bug Something isn't working label Nov 6, 2024
@adriangb
Copy link
Contributor Author

adriangb commented Nov 6, 2024

cc @alamb I had promised you this a long time ago but only got around to it now

@alamb
Copy link
Contributor

alamb commented Nov 6, 2024

Thanks @adriangb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants