Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot register table to represent multiple parquet files in S3 bucket #4204

Open
andygrove opened this issue Nov 14, 2022 · 3 comments
Open
Labels
bug Something isn't working
Milestone

Comments

@andygrove
Copy link
Member

Describe the bug
This works:

CREATE EXTERNAL TABLE yellow_2019_01 STORED AS PARQUET LOCATION "s3://ossb-nyctaxi/yellow/2019/yellow_tripdata_2019-01.parquet";

This does not work:

CREATE EXTERNAL TABLE yellow_2019 STORED AS PARQUET LOCATION "s3://ossb-nyctaxi/yellow/2019";

Fails with:

ObjectStore(NotFound { path: "yellow/2019", source: Error { retries: 0, message: "No Body", source: Some(reqwest::Error { kind: Status(404), url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("s3.us-east-2.amazonaws.com")), port: None, path: "/ossb-nyctaxi/yellow/2019", query: None, fragment: None } }) } }

To Reproduce
As described.

Expected behavior
Should be able to register a directory containing parquet files.

Additional context
Possibly related to #1736

@andygrove andygrove added the bug Something isn't working label Nov 14, 2022
@andygrove andygrove added this to the 15.0.0 milestone Nov 14, 2022
@andygrove
Copy link
Member Author

I found that adding a trailing / to the URL fixes this, so I will file a PR to add documentation

@cfraz89
Copy link
Contributor

cfraz89 commented Jan 1, 2023

I encountered a similar issue in direct api usage - using ListingSchemaProvider. When pointing it to a store with folders for each schema, it also creates the schemas with paths absent a trailing slash, causing the same issue. Made PR to fix this.

@kylebrooks-8451
Copy link

kylebrooks-8451 commented Apr 27, 2023

I found that adding a trailing / to the URL fixes this, so I will file a PR to add documentation

In addition to documenting this, could we change the is_dir logic in this code to detect if the path is a folder?

I'm thinking that we could Try to list the path and catch a failure to test if it is a directory.

We also might be able to count on the size of the Metadata on a folder to be 0 which would be a better option than trying to list the path if that is assured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants