Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Registered with ObjectStore registry is different from url in DeltaScan #1018

Closed
Blajda opened this issue Dec 14, 2022 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@Blajda
Copy link
Collaborator

Blajda commented Dec 14, 2022

Environment

Delta-rs version: latest

Binding: rust


Bug

What happened:
I loaded a Delta table backed by S3 storage and then registered it with Datafusion. I then performed a select which failed since ObjectStore was unable to get data from the backend.

The error provided by ObjectStore showed it tried to get from "https://amazon.com/path" where I configured Objectstore to use "http://localhost/"

What you expected to happen:
That when a table is registered with Datafusion it uses the same underlying configuration the registered table had.

How to reproduce it:

More details:
Further investigation showed that the URL passed to the ObjectStore registry is different from the URL used to get the ObjectStore. Since the correct url is not used it will create a new DeltaTable instance with defaults.

See rust/src/delta_datafusion.rs:404 and rust/src/delta_datafusion.rs:309

@Blajda Blajda added the bug Something isn't working label Dec 14, 2022
@roeap
Copy link
Collaborator

roeap commented Dec 16, 2022

Thanks for the report! We'll have to look a bit deeper here. The Url passed to the object store registry is actually just an invention of delta-rs since we require out object store to be rooted at the table root and to avoid collisions with other stores that may be registered to the "raw" object store url.

However somehwere urls get mixed up ... :). Do you by any chance have a quick repro example we can use for debugging.

We do some integration testing with datafusion and S§, maybe we can find some differences.

@Blajda
Copy link
Collaborator Author

Blajda commented Dec 16, 2022

This defect slips through the integration tests since the Object Store configurations are exported to the environment hence when the "incorrect" url is obtained from the registry it rebuilds from that new URL plus the environment.

In my use case I configured the underlying storage by passing in a HashMap with all the configurations values hence the default S3 environment variables are not exposed causing the defaults to be used.

I can help by modifying the integration tests to instead maybe use a prefix of DELTA_RS_INT_ and then read the configuration into a HashMap that's passed to the correct builders. It would help with detecting any other implicit environments defects that exists.

If you have any other approaches I'd like to hear them.

@roeap
Copy link
Collaborator

roeap commented Jan 17, 2023

@Blajda - sorry for the late reply. With the latest commits on main there are some updates to make more consistent use of the options map to create object stores in various places, which may already cover this bug. Would it be possible for you to confirm this?

Of course having integration tests cover this scenario would be great!

While we worked on making configuration more consistent, there is one piece still missing, which is configuration for the S3 lock client. Right now this also updates the environment. However as long as the configuration in the map is complete, we should in most places no longer take any values from the environment.

@Blajda
Copy link
Collaborator Author

Blajda commented Jan 18, 2023

Hi @roeap
Yes it seems like this issue is resolved. I was able to use query with Datafusion using main.
While checking this fix I encountered another issues so I'll open a new ticket.

Thanks for all the work.

@Blajda Blajda closed this as completed Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants