Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-4417] Support BigQuery's NUMERIC type using Java #5755

Merged
merged 1 commit into from
Jun 30, 2018
Merged

[BEAM-4417] Support BigQuery's NUMERIC type using Java #5755

merged 1 commit into from
Jun 30, 2018

Conversation

ElliottBrossard
Copy link

For comparison, this is the PR that added support in google-cloud-java: googleapis/google-cloud-java#3110.

I will add Python support separately, unless it's preferable to include that in this PR as well.

@kennknowles
Copy link
Member

This LGTM as far as it goes. Should I hope for a test that does some end-to-end thing and get the numeric out?

@kennknowles
Copy link
Member

The test failures is a flake. I'll kick it.

@kennknowles
Copy link
Member

run java precommit

@kennknowles
Copy link
Member

run java postcommit

@ElliottBrossard
Copy link
Author

RE end-to-end testing, that's what I'm not sure about. Is there an integration test that I can update or some way of using this code to read from a BigQuery table with a NUMERIC column to verify that it works? I'd hate to push something through that looks fine in unit tests but falls apart in practice.

@kennknowles
Copy link
Member

Good question. Pinging @chamikaramj @lgajowy perhaps?

@chamikaramj
Copy link
Contributor

Looks like we do have some tests here: https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/941/testReport/org.apache.beam.sdk.io.gcp.bigquery/

I think these are relatively small tests that use DirectRunner. For DataflowRunner+BQ I think we currently rely on internal Google testing.

@lgajowy
Copy link
Contributor

lgajowy commented Jun 27, 2018

Technically, I think you could try to write BigQueryIOIT in a fashion described here and then run it to see if everything works as expected. Note that the test itself does not require any kubernetes scripts (as described in docs) and could be run on some existing bigQuery instance.

I can help with this if you choose this way. :)

@kennknowles
Copy link
Member

We have turned on autoformatting of the codebase, which causes small conflicts across the board. You can probably safely rebase and just keep your changes. Like this:

$ git rebase
... see some conflicts
$ git diff
... confirmed that the conflicts are just autoformatting
... so we can just keep our changes are do our own autoformat
$ git checkout --theirs --
$ git add -u
$ git rebase --continue
$ ./gradlew spotlessJavaApply

Please ping me if you run into any difficulty.

@ElliottBrossard
Copy link
Author

I think (hope) that I fixed the conflicts. I haven't been able to figure out how to run the integration test, though. Are there specific instructions about how to run org.apache.beam.examples.cookbook.BigQueryTornadoesIT? I've read https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests, but I don't understand the association between Perfkit, Gradle, and Beam. I installed Gradle, but my naive attempt at running the test fails:

~/beam/examples/java$ ~/gradle-4.8.1/bin/gradle test --tests=org.apache.beam.examples.cookbook.BigQueryTornadoesIT
...
Execution failed for task ':beam-examples-java:test'.
No tests found for given includes: [**/Test*.class, **/*Test.class, **/*Tests.class, **/*TestCase.class](include rules) [org.apache.beam.examples.cookbook.BigQueryTornadoesIT](--tests filter)

The example indicates using the performanceTest task with gradlew, but I can't tell where that is defined. Any pointers you can provide would be great.

@kennknowles
Copy link
Member

The instructions are nonexistent I think. Since you left the "Allow edits by maintainers" box checked I've taken the liberty of pushing an autoformat commit to get that to pass. I will now say the magic words to run the appropriate postcommit tests on this PR.

@kennknowles
Copy link
Member

run java postcommit

@kennknowles
Copy link
Member

Someone broke head for an unfortunate minute.

@kennknowles
Copy link
Member

retest this please

@kennknowles
Copy link
Member

run java postcommit

@kennknowles
Copy link
Member

OK it passed before & after. Since you will no longer need your local copy to be in sync with this PR, I am going to fix up the commit history then merge this.

@kennknowles kennknowles merged commit c6e9740 into apache:master Jun 30, 2018
@lgajowy
Copy link
Contributor

lgajowy commented Jul 2, 2018

@ElliottBrossard I think you've found an issue in the gradle code. performanceTest task is defined in BeamModulePlugin. Therefore every module that uses it, has to append it by adding

provideIntegrationTestingDependencies()
enableJavaPerformanceTesting()

in the build.gradle file. For some reason, those are not added in examples' build.gradle, so this is why you are unable to use it and run the test. We'll have to fix it. Thanks for reporting this.

JIRA for this: https://issues.apache.org/jira/browse/BEAM-4706

@ElliottBrossard
Copy link
Author

Thanks, Kenneth! And thanks, Łukasz, for filing an issue.

@KumarKishan
Copy link

KumarKishan commented Jul 3, 2018

Hi @ElliottBrossard can We Test the Numeric DataType After Building 2.6.0-SNAPSHOT from master?

@ElliottBrossard
Copy link
Author

@KumarKishan It looks like you deleted your original post, but I'm guessing the issue is that this new code expects for the NUMERIC column to be represented as a STRING in the Avro file, whereas it is actually BYTES with DECIMAL logical type. This was the error that you reported:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.google.common.base.VerifyException: Expected Avro schema type STRING, not BYTES, for BigQuery NUMERIC field

The problem is that AvroCoder writes BigDecimal values as STRINGs, so I was under the false impression that this is what BigQueryAvroUtils should expect to see :/

@KumarKishan
Copy link

Yes @ElliottBrossard, I got this Exception.

@lgajowy
Copy link
Contributor

lgajowy commented Jul 4, 2018

@ElliottBrossard we fixed the JIRA mentioned above (link). FWIW, we run the test successfully on Dataflow and direct runners (see the comment in the JIRA issue) after your change was merged.

@ElliottBrossard
Copy link
Author

I was able to reproduce the problem using a modified BigQueryTornadoes :D For the sake of posterity, this was the command I used:

./gradlew -i integrationTest -p examples/java/ -DintegrationTestPipelineOptions='["--tempLocation=$gcs_path","--project=$project_id"]' --tests org.apache.beam.examples.cookbook.BigQueryTornadoesIT -DintegrationTestRunner=direct

I'm working on a PR to fix the error from my comment above.

@lgajowy
Copy link
Contributor

lgajowy commented Jul 13, 2018

Great! :)

@pabloem
Copy link
Member

pabloem commented Jul 20, 2018

Hi all! Is this now fixed? Does the Tornadoes test verify it?

@ElliottBrossard
Copy link
Author

Yes, this should be fixed now. The current version of the Tornadoes test doesn't exercise NUMERIC, but the unit tests now use the proper Avro encoding (BYTES+DECIMAL) in correspondence with NUMERIC to verify that BigQueryIO is able to read those values.

@pabloem
Copy link
Member

pabloem commented Nov 27, 2018

FWIW I'll note that I've implemented a test that verifies this functionality and runs internally at Google.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants