-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuring endpoint for Azure FS #23071
Conversation
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java
Show resolved
Hide resolved
687c30c
to
c463301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now.
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureLocation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % one question
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think about making the entire endpoint URL configurable, so instead of hardcoding https://<account name>
+ .dfs.
+ <endpoint>
, a user could configure the endpoint to be "https://<accountname>.dfs.<endpoint>"
explicitly?
That would open things up to test with use other implementations like the Azurite emulator in the future, where you're likely to need something like http://<ip or container name or localhost>:some port
, with no .dfs.
c463301
to
efeb100
Compare
@ms1111 That's a good idea, which I considered, but it's complicated due to the ABFS URI syntax containing the storage account. @dain tried using Azurite when he wrote the Azure FS, but ran into various issues and incompatibilities with the real Azure storage, which is why we don't use it for testing. We can extend the config in the future when there is a real use case or viable alternate implementation. Let us know if you have a specific use case in mind. The goal of the current change is to support all of the Azure clouds while being easy for users to configure. |
@electrum Yeah, I was following some of the Azurite issues in Azure/Azurite#553 and I figure if they're solved in the future, then it would require changing the Trino configuration again to allow specifying a local hostname, port, and scheme (http). I was just thinking that might be a hard sell later, because you'd end up with two different historical ways to override the endpoint - the one being added here, and a new way to allow overriding the other pieces. Being able to use Azurite is handy for local & CI testing if you have a software project that uses Trino and other components that all need to talk to a data lake. You can write integration tests that hit a few different containers and they all can touch the same tables. (Sometimes you can do this in with file:// URIs instead of emulating abfs:// or s3a://, but not always - there's always some weird limitation, like for example CircleCI's remote docker executor can't do bind mounts, so you can't share files between the primary and secondary containers.) Because of the hardcoded endpoint URL in Trino, we're trying with S3/MinIO instead, but given we're an Azure shop, it'd be nice to be able to do it all with the Azure filesystem eventually. |
@ms1111 We can extend the config in the future to accept a URL. This won't conflict with the existing configuration. However, I don't think that Azurite integration would be as simple as allowing to configure a URL endpoint, as the way storage accounts are handled when constructing URLs is different. We need to take the storage account from the ABFS location and generate an endpoint URL, which has a different format for real Azure vs Azurite. So if Azurite ever actually supports ADLS, we will likely need to add specific support for it. |
We test the S3 file system against MinIO, S3Mock and LocalStack. I agree the local storage tests are very useful and we will certainly support Azurite if they ever implement ADLS support and make it actually compatible with Azure. |
Lets ship this @electrum |
I am merging after chatting with @anusudarsan |
Additional context and related issues
Fixes #21307
Release notes
(x) Release notes are required, with the following suggested text: