Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ListingSchemaProvider directory paths (related: #4204) #4788

Merged
merged 1 commit into from
Jan 3, 2023

Conversation

cfraz89
Copy link
Contributor

@cfraz89 cfraz89 commented Jan 1, 2023

  • Append trailing slash to table paths if they are directories

Which issue does this PR close?

Is similar to #4204 - inability to use an object store listing for a table schema.

However this PR addresses tables generated from ListingSchemaProvider.

Rationale for this change

I would like to set up an object store (s3) where each directory maps to a single table/schema, with the contents being made up of all files (parquet) inside the directory. By registering the schema provider like:

Arc::new(ListingSchemaProvider::new(
        String::from(format!("s3://{bucket_name}")),
        "".into(),
        Arc::new(ListingTableFactory::default()),
        s3.clone(),
        String::from("PARQUET"),
        false,
    ));

Then if there is a folder in the bucket, such as userdata, attempting to query against userdata table causes the s3 client to 404, as the provider creates ListingTables with paths set to the raw directory names, eg userdata, and in datafusion/core/src/datasource/listing/url.rs:149, we have:

        // If the prefix is a file, use a head request, otherwise list
        let is_dir = self.url.as_str().ends_with('/');

Since the paths don't end with /, it treats the directories as files and attempts to perform head on them instead of listing them.

This PR remedies this scenario, allowing the query to succeed.

What changes are included in this PR?

ListingSchemaProvider is altered to track whether the table paths it has listed are directories or files. If they are directories, it creates the ListingTables with a '/' appended to the stringified table path, allowing the ListingTable to successfully list its contents.

Are these changes tested?

Some light unit tests added.

Are there any user-facing changes?

No

- Append trailing slash to table paths  if they are directories
@github-actions github-actions bot added the core Core DataFusion crate label Jan 1, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @cfraz89
cc @tustvold @rdettai

@alamb alamb merged commit 34a8b86 into apache:master Jan 3, 2023
@alamb
Copy link
Contributor

alamb commented Jan 3, 2023

Thanks again @cfraz89

@ursabot
Copy link

ursabot commented Jan 3, 2023

Benchmark runs are scheduled for baseline = 6c0e5d9 and contender = 34a8b86. 34a8b86 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@cfraz89
Copy link
Contributor Author

cfraz89 commented Jan 4, 2023

No problems, thanks the great library!

@tustvold
Copy link
Contributor

tustvold commented Jan 4, 2023

FYI this was merged with failing clippy lints, fix in #4817

@alamb
Copy link
Contributor

alamb commented Jan 4, 2023

FYI this was merged with failing clippy lints, fix in #4817

Sorry that was my bad 😞 Thank you for fixing it @tustvold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants