-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-876] Support schemaUpdateOption in BigQueryIO #9524
Conversation
Thanks for the contribution. R: @pabloem will you be able to review ? |
Yes, I'll be glad to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! BigQuery adds features often, and it's great to support them.
I think this looks fine to me, except for the one comment. Also, have you tried running this transform?
It may be good to have an integration test, just to be sure that it'll work fine.... do you know how to add one? I wouldn't say it's a must, but it would be amazing to have one.
.../io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java
Outdated
Show resolved
Hide resolved
Also, sorry about the delay. Don't hesitate to ping on the PR / email me. |
Should I review once more? |
I haven't had a chance to write the integration test yet... but was hoping to take a shot at it in the next week or so. |
thanks! |
@pabloem I found some time to write up an integration test. I'm seeing a failing check ( |
Run Java PreCommit |
Run Java PreCommit |
Thanks for writing the test! That's going above and beyond : ) - the changes LGTM. I'll try to run the test to check. |
r: @chamikaramj would you like to do an extra pass? |
Okay, bringing in now. Thanks @ziel |
[BEAM-876] Support schemaUpdateOption in BigQueryIO
This adds schemaUpdateOptions to
BigQueryIO.Write
so that one can specify these for BiqQuery when writing in batch mode.Usage example:
Implementation Notes
Hi, hello:
Hi all. I haven't contributed to this code base before, and am not super familiar with it. Style advice and such is super welcome.
Load vs Query Jobs in BigQuery:
BigQuery supports schema update side effects for load and query jobs. This implements support for load jobs only. In the context of Apache Beam, it doesn't seem like there's much utility to supporting queries which write to a table directly (as opposed to reading via a query and writing later in the pipeline). (Open to discussion/correction of course).
Avoiding Changing WriteTables Constructor:
I've added this to
WriteTables
with awithSchemaUpdateOptions
method so as not to perturb the public constructor with an optional parameter. There's some awkwardness there -- but it seemed preferable to breaking the api or adding a secondary constructor.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.