Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1542] SpannerIO sink updates #3221

Closed
wants to merge 26 commits into from
Closed

Conversation

mairbek
Copy link
Contributor

@mairbek mairbek commented May 24, 2017

Unit and integration test.
Logical mutation size estimation.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify.
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

gamolina and others added 25 commits May 4, 2017 11:28
Also minor cleanup alphabetization in root pom.xml
Also minor cleanup alphabetization in root pom.xml
This is not appropriate for examples. SpannerIO should be well-javadoced
and integration tested.
And remove outdated Bigtable comment
* Rename to Write to match the rest of the SDK.
* Convert to AutoValue, delete toString.
* Drop .writeTo(), instead use .write() as default constructor.
* Temporarily drop withBatchSize, as its existence is not clearly
  justified.
… mergespanner

# Conflicts:
#	sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
#	sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java
* Rename to Write to match the rest of the SDK.
* Convert to AutoValue, delete toString.
* Drop .writeTo(), instead use .write() as default constructor.
* Temporarily drop withBatchSize, as its existence is not clearly
  justified.
Better formatting
…nner

# Conflicts:
#	pom.xml
#	sdks/java/io/google-cloud-platform/pom.xml
#	sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
…nner

# Conflicts:
#	pom.xml
#	sdks/java/io/google-cloud-platform/pom.xml
#	sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 70.723% when pulling 18c1ab6 on mairbek:mergespanner into c0d19f9 on apache:master.

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I'll run the postcommit to make sure the integration test actually passes.

*/
@Experimental(Experimental.Kind.SOURCE_SINK)
public class SpannerIO {

@VisibleForTesting
static final int SPANNER_MUTATIONS_PER_COMMIT_LIMIT = 20000;
private static final long DEFAULT_BATCH_SIZE = 1024 * 1024; // 1 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add _BYTES here and to the relevant builder.

@@ -72,21 +75,32 @@
* mutations.apply(
* "Write", SpannerIO.write().withInstanceId("instance").withDatabaseId("database"));
* }</pre>
*
* <p>The default size of the batch is set to 1MB, to override this use {@link
* Write#withBatchSize(long)}. Setting batch size to a small value or zero practically disables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withBatchSizeBytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -217,20 +282,19 @@ public void finishBundle() throws Exception {
@Teardown
public void teardown() throws Exception {
if (spanner == null) {
return;
return;
}
spanner.closeAsync().get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set spanner to null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.addIfNotNull(DisplayData.item("databaseId", databaseId)
.withLabel("Database"));
.addIfNotNull(DisplayData.item("instanceId", spec.getInstanceId()).withLabel("Instance"))
.addIfNotNull(DisplayData.item("databaseId", spec.getDatabaseId()).withLabel("Database"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add if not default batchsize

add the service factory or the class of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Description("Instance ID to write to in Spanner")
@Default.String("beam-test")
String getInstanceId();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please delete empty lines between getter and setter in options.

+ " ("
+ " Key INT64,"
+ " Value STRING(MAX),"
+ ") PRIMARY KEY (Key)"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way in the spanner client to defend against injection attacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with DDL...

@dhalperi
Copy link
Contributor

https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/3916/ is a run of the postcommit against this PR

@aaltay
Copy link
Member

aaltay commented May 25, 2017

R: @dhalperi

flushBatch();
}
}

private String projectId() {
return spec.getProjectId() == null
? ServiceOptions.getDefaultProjectId()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukecwik you know a lot about default projects -- does this do the right thing generally? Should we switch to it at a larger scale?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to swap to use ServiceOptions.getDefaultProjectId() a while ago and have not been able to perform that migration for GcpOptions. If we can include it here great. This would remove all our hand-coded detect default project code. Note that by doing this we would remove support for gcloud users who have old gcloud SDK installations.

I requested the original change in gcloud-java-core to have this exposed:
googleapis/google-cloud-java#1380

spanner = SpannerOptions.newBuilder().setProjectId(options.getProjectId()).build().getService();

databaseAdminClient = spanner.getDatabaseAdminClient();
Operation<Database, CreateDatabaseMetadata> op =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagining failure modes, let's say for some reason the tearDown step does not run. We should try dropping the database here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added a line to to drop the DB.
In case if DB is not found it's a noop.

@dhalperi
Copy link
Contributor

https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/3919/ is a run of the ITs against the current HEAD

@dhalperi
Copy link
Contributor

retest this please

@dhalperi
Copy link
Contributor

Can you rebase and squash to a single commit? I'm still not sure why the build is failing

@mairbek mairbek force-pushed the mergespanner branch 2 times, most recently from 7af3a76 to bd3912b Compare May 26, 2017 18:32
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling bd3912b8086eaf100abe971d83a2b9172a0c05f8 on mairbek:mergespanner into ** on apache:master**.

@dhalperi
Copy link
Contributor

https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/3935/ is a run of postcommit against current branch

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 70.794% when pulling d20d6da on mairbek:mergespanner into c687887 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 70.797% when pulling d20d6da on mairbek:mergespanner into c687887 on apache:master.

DisplayData.item("instanceId", getInstanceId()).withLabel("Output Instance"))
.addIfNotNull(
DisplayData.item("databaseId", getDatabaseId()).withLabel("Output Database"))
.addIfNotNull(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this can't be null, it's a long.
  2. Use addIfNotDefault so it doesn't show up when left to default settings. https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/display/DisplayData.java#L212

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is valuable to keep it. -> using add.

@dhalperi
Copy link
Contributor

The IT passed, so LGTM. One minor recommendation left.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 70.787% when pulling c07a32f on mairbek:mergespanner into c687887 on apache:master.

@mairbek mairbek closed this May 26, 2017
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 70.783% when pulling c07a32f on mairbek:mergespanner into c687887 on apache:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants