Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add compression option ZSTD. #1890

Merged
merged 13 commits into from
Apr 11, 2024
Merged

feat: Add compression option ZSTD. #1890

merged 13 commits into from
Apr 11, 2024

Conversation

chalmerlowe
Copy link
Collaborator

Based on a PR submitted by EthanSteinberg.

One of BigQuery's neat features is how it supports ZSTD compression for exports.
See: https://cloud.google.com/bigquery/docs/exporting-data#parquet_export_details
This commit simply adds ZSTD to the list of enums allowed for the compression type.

I added a test to confirm that the enum is correctly populated with only the allowed options.

For future me, including this link directly to the list of current export formats and the allowable compression types.

Closing Ethan's PR.

@chalmerlowe chalmerlowe requested review from a team as code owners April 8, 2024 14:32
@chalmerlowe chalmerlowe requested a review from Linchin April 8, 2024 14:32
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery API. labels Apr 8, 2024
@chalmerlowe chalmerlowe self-assigned this Apr 8, 2024
@chalmerlowe chalmerlowe added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 8, 2024
@chalmerlowe
Copy link
Collaborator Author

Before we merge, there are a couple items I wanna investigate.
More to come.

@chalmerlowe chalmerlowe requested a review from tswast April 8, 2024 16:04
tests/unit/test_enums.py Outdated Show resolved Hide resolved
# limitations under the License.


def test_compression_enums():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how useful this test is? Seems an awful lot like a change-detector test to me. I'd be fine adding the constant without adding the test.

Alternatively, maybe there's a system test we could write to make sure this is synced with the bigquery discovery document? But even then, compression isn't a true enum. The allowed values are only listed in the documentation string from what I can tell.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast

I will remove this test.

In terms of attempting to ensure that this matches the discovery docs... the terms referenced by the docs are present in the description, as you note, so we would need some mechanism to extract them from the discovery doc, which feels somewhat fragile (ie extract all words that are ALL CAPS and deduplicate them). Thoughts?

"JobConfigurationExtract": {
      ...
      "properties": {
        "compression": {
          "description": "Optional. The compression type to use for exported files.
Possible values include DEFLATE, GZIP, NONE, SNAPPY, and ZSTD. The
default value is NONE. Not all compression formats are support for all
file formats. DEFLATE is only supported for Avro. ZSTD is only supported
for Parquet. Not applicable when extracting models.",
          "type": "string"
        },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Without a structured representation of the allowed values, it's too fragile.

tests/unit/job/test_extract.py Outdated Show resolved Hide resolved
@chalmerlowe chalmerlowe added automerge Merge the pull request once unit tests and other checks pass. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Apr 11, 2024
@chalmerlowe chalmerlowe merged commit 5ed9cce into main Apr 11, 2024
21 checks passed
@chalmerlowe chalmerlowe deleted the patch-1 branch April 11, 2024 19:33
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants