-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39968: [Python][FS][Azure] Minimal Python bindings for AzureFileSystem
#40021
GH-39968: [Python][FS][Azure] Minimal Python bindings for AzureFileSystem
#40021
Conversation
8d6b96d
to
99e1354
Compare
AzureFileSystem
AzureFileSystem
AzureFileSystem
AzureFileSystem
AzureFileSystem
AzureFileSystem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to update ci/scripts/python_*.sh
/.github/workflows/python.yml
too for PYARROW_WITH_AZURE
in this PR. Or we can do it in a separated PR to keep this PR minimal.
I updated |
The MATLAB builds seem to be having issues. I don't think these can be related to my changes |
Yes. MATLAB related failures are unrelated. Could you open an issue for it to ignore the failures in this PR? |
@github-actions crossbow submit -g cpp -g wheel |
This comment was marked as outdated.
This comment was marked as outdated.
Created an issue: #40034 |
2 CI failures: I think both are unrelated to this PR |
Co-authored-by: Joris Van den Bossche <[email protected]>
793db20
to
20e7a31
Compare
I've just rebased after #40455 @jorisvandenbossche when you get a chance please could you re-review. Sorry to be impatient, but I really want to start using this and I'm hoping it will be merged in time for the 16.0.0 release. |
No need to apologize for the ping! ;) The conclusion from the last discussion just above about extra test builds to enable this (#40021 (comment)) is that this can wait for later? Do we want to create a follow-up issue to add this to some additional non-conda builds? |
Thanks for reviewing and merging. I created an issue for the minio and azurite thing #40509 |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 9f6dc1f. There were 7 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
blob_storage_scheme : str, default None | ||
Either `http` or `https`. Defaults to `https`. Useful for connecting to a local | ||
emulator, like Azurite. | ||
dfs_storage_scheme : str, default None | ||
Either `http` or `https`. Defaults to `https`. Useful for connecting to a local | ||
emulator, like Azurite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou should this also change to enable_tls
like you did in the URI parsing? cc @Tom-Newton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think so. We may want to use AzureOptions::FromUri()
instead of re-implementing the same logic.
@Tom-Newton Could you follow-up this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the issue for completing the python bindings and referenced this conversation #40572. There is a good chance that I will work on it but I can't say when.
I'm not really sure what to make of this. These benchmarks do seem potentially relevant but all I've done is add a feature not modify anything so I don't see how this PR could have caused a regression. |
You can ignore all of them. I checked earlier today, and all are spurious spikes in the timings. |
Rationale for this change
We want to use the new
AzureFileSystem
inpyarrow
.What changes are included in this PR?
AzureFileSystem
. This includes just enough to run the python tests against azurite plus default credential auth to enable real use of this once this PR merges.ARROW_AZURE=OFF
rather than relying on defaults. The defaults are different for builds vs tests so this was causing tests to be enabled while Azure was disabled during the build.Are these changes tested?
Enabled the the python filesystem tests for the new filesystem. I had to skip azure in a couple of the tests though because they are not yet working on the C++ side. I created Github issues to resolve these #40025 and #40026 and added TODO comments where relevant, that reference these Github issues.
Are there any user-facing changes?
pyarrow
users can now use the nativeAzureFileSystem
to get much better reliability and performance compared toadlfs
based options.AzureFilesystem
#39968AzureFilesystem
#39968