Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][FS][Azure] Include user set metadata in ObjectInputFile::ReadMetadata #40026

Closed
Tom-Newton opened this issue Feb 10, 2024 · 2 comments
Closed

Comments

@Tom-Newton
Copy link
Contributor

Describe the enhancement requested

Child of #18014

The current implementation only returns system metadata on the blobs not user set metadata. That means the AzureFileSystem can't round trip the metadata.

As part of this task TEST_F(TestAzuriteFileSystem, WriteMetadata) should probably be updated so that it uses AzureFileSystem to assert that metadata is being written correctly. Currently this tests uses the Azure APIs to fetch the metadata to make this assertion.

Component(s)

C++

@Tom-Newton
Copy link
Contributor Author

Assuming #40021 is completed first, then this ticket should include enabling the python side tests for move. I will add TODO(GH-40026) comments where relevant.

jorisvandenbossche added a commit that referenced this issue Mar 13, 2024
…ystem` (#40021)

### Rationale for this change
We want to use the new `AzureFileSystem` in `pyarrow`. 

### What changes are included in this PR?
- Add minimal python bindings for `AzureFileSystem`. This includes just enough to run the python tests against azurite plus default credential auth to enable real use of this once this PR merges. 
- Adding additional configuration options and remaining authentication options can be done as a follow up. 
- I tried to copy the existing pybinds for GCS and S3
- Explicitly set `ARROW_AZURE=OFF` rather than relying on defaults. The defaults are different for builds vs tests so this was causing tests to be enabled while Azure was disabled during the build.

### Are these changes tested?
Enabled the the python filesystem tests for the new filesystem. I had to skip azure in a couple of the tests though because they are not yet working on the C++ side. I created Github issues to resolve these #40025 and #40026 and added TODO comments where relevant, that reference these Github issues. 

### Are there any user-facing changes?
`pyarrow` users can now use the native `AzureFileSystem` to get much better reliability and performance compared to `adlfs` based options. 

* Closes: #39968
* GitHub Issue: #39968

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
kou added a commit that referenced this issue Mar 22, 2024
…ata (#40671)

### Rationale for this change

Users may want to set user defined metadata and/or properties such as `Content-Type`.

### What changes are included in this PR?

* Write user defined metadata.
* But the following metadata are written as properties:
  * `Content-Type`
  * `Content-Encoding`
  * `Content-Language`
  * `Content-Disposition`
  * `Cache-Control`
* The following metadata is ignored because corresponding properties are auto generated:
  * `Content-Hash`

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #40026

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou
Copy link
Member

kou commented Mar 22, 2024

Issue resolved by pull request 40671
#40671

@kou kou added this to the 16.0.0 milestone Mar 22, 2024
@kou kou closed this as completed Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants