-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go][Parquet] pqarrow.FileWriter allow adding parquet metadata after writing rowgroups #35775
Comments
I added a quick change to allow this. See commits above. I can add some tests and send a PR if there's any interest. It turns out that although you can update the key value metadata on |
|
thanks @xxgreg!! There's definitely interest in this and thanks for fixing the bug with persisting the metadata. Please add some tests and send a PR, it should automatically add me as a codeowner to review it. |
Apologies for the slow reply. And thanks for the super quick replies! I've got a couple of other priorities to sort out first. I'm planning to pick this up again in 2 weeks. |
I'm also interested in being able to set the file metadata after writing row groups. The commit proposed by @xxgreg changes the signature of the |
It would also be useful to see an example of how the I only see The I'm unclear on why the |
@tschaub I think it would be fine to create a backwards incompatible change here (as long as we mark the PR appropriately). While a non-breaking change would always be preferred (such as creating a new function instead of changing the existing one) it's not a deal breaker on this.
You are exactly correct, they were meant to be settable before the writer is closed as |
I've put together a proposed set of changes in #37786. |
…fter writing row groups (#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: #35775 Authored-by: Tim Schaub <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Describe the enhancement requested
I'd like to add a key/value metadata field to the Parquet metadata. The value of the field is not known until after the row groups have been written.
It looks like it is possible to do this when using
parquet/file.Writer
but it isn't possible when usingpqarrow.FileWriter
.Would it make sense to add some API to allow for this?
Or perhaps it's already there, and I can't see it.
Component(s)
Go
The text was updated successfully, but these errors were encountered: