Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use file_name instead of path when checking for expected file extension #8265

Closed
wants to merge 1 commit into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Nov 18, 2023

Which issue does this PR close?

Closes #8264

Rationale for this change

Fix check for expected file extension to use file_name instead of path, which could include a trailing /.

Before this change

My Ballista benchmark fails with this confusing error:

File 'supplier.parquet' does not match the expected extension '.parquet'

After this change

No errors.

What changes are included in this PR?

Are these changes tested?

We have existing tests.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 18, 2023
@andygrove andygrove changed the title Check file_name instead of path when checking for expected file exten… Use file_name instead of path when checking for expected file extension Nov 18, 2023
@andygrove andygrove marked this pull request as ready for review November 18, 2023 17:09
@andygrove
Copy link
Member Author

@Dandandan After applying this fix I was able to run queries in Ballista from your df33 branch.

@@ -859,7 +859,7 @@ impl SessionContext {
// check if the file extension matches the expected extension
for path in &table_paths {
let file_name = path.prefix().filename().unwrap_or_default();
if !path.as_str().ends_with(&option_extension) && file_name.contains('.') {
if !file_name.ends_with(&option_extension) && file_name.contains('.') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is just ill-formed, if the ListingTableUrl is to a directory containing a . this check will reject it. I'm not sure what this check is here for tbh...

@@ -859,7 +859,7 @@ impl SessionContext {
// check if the file extension matches the expected extension
for path in &table_paths {
let file_name = path.prefix().filename().unwrap_or_default();
if !path.as_str().ends_with(&option_extension) && file_name.contains('.') {
if !file_name.ends_with(&option_extension) && file_name.contains('.') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !file_name.ends_with(&option_extension) && file_name.contains('.') {
if !file_name.ends_with(&option_extension) && !path.is_collection() {

Perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't compile for me. Perhaps this method isn't available in the version of arrow-rs that we are currently using?

no method named `is_collection` found for reference `&ListingTableUrl` in the current scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListingTableUrl is a DataFusion concept, perhaps your checkout is behind main?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably regression after #7972
We should add a test with dot in the folder name

@Weijun-H

@Weijun-H
Copy link
Member

This probably regression after #7972 We should add a test with dot in the folder name

@Weijun-H

I added the test for the directory with . in #8267

@andygrove
Copy link
Member Author

Closing this in favor of #8267

@andygrove andygrove closed this Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File 'supplier.parquet' does not match the expected extension '.parquet'
4 participants