Cloud Spanner DML & PDML Release (#3781)

* Launch Cloud Spanner DML & PartitionedDML support Revert "Revert "Revert "Revert "Cloud Spanner DML & PartitionedDML support (#3703)" (#3741)" (#3755)" (#3761)" This reverts commit 68f38e1. * Cloud Spanner - Bug fix for seqNo, fix javadoc errors
googleapis · Oct 5, 2018 · 1e539fa · 1e539fa
1 parent d197058
commit 1e539fa
Show file tree

Hide file tree

Showing 10 changed files with 555 additions and 64 deletions.
diff --git a/...d-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/DatabaseClient.java b/...d-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/DatabaseClient.java
@@ -135,7 +135,7 @@ public interface DatabaseClient {
   ReadOnlyTransaction singleUseReadOnlyTransaction();
 
   /**
-   * Returns a read-only transaction context in which a single read or query can be performed at the
+   * Returns a read-only transaction context in which a single read or query can be performed at
    * given timestamp bound. This method differs from {@link #singleUse(TimestampBound)} in that the
    * read timestamp used may be inspected after the read has returned data or finished successfully.
    *
@@ -269,4 +269,53 @@ public interface DatabaseClient {
    *
    */
   TransactionManager transactionManager();
+
+  /**
+   * Returns the lower bound of rows modified by this DML statement.
+   *
+   * <p>The method will block until the update is complete. Running a DML statement with this method
+   * does not offer exactly once semantics, and therfore the DML statement should be idempotent. The
+   * DML statement must be fully-partitionable. Specifically, the statement must be expressible as
+   * the union of many statements which each access only a single row of the table. This is a
+   * Partitioned DML transaction in which a single Partitioned DML statement is executed.
+   * Partitioned DML partitions the key space and runs the DML statement over each partition in
+   * parallel using separate, internal transactions that commit independently. Partitioned DML
+   * transactions do not need to be committed.
+   *
+   * <p>Partitioned DML updates are used to execute a single DML statement with a different
+   * execution strategy that provides different, and often better, scalability properties for large,
+   * table-wide operations than DML in a {@link #readWriteTransaction()} transaction. Smaller scoped
+   * statements, such as an OLTP workload, should prefer using {@link
+   * TransactionContext#executeUpdate(Statement)} with {@link #readWriteTransaction()}.
+   *
+   * <p>That said, Partitioned DML is not a drop-in replacement for standard DML used in {@link
+   * #readWriteTransaction()}.</p>
+   *
+   * <ul>
+   *   <li>The DML statement must be fully-partitionable. Specifically, the statement must be
+   *       expressible as the union of many statements which each access only a single row of the
+   *       table.
+   *   <li>The statement is not applied atomically to all rows of the table. Rather, the statement
+   *       is applied atomically to partitions of the table, in independent internal transactions.
+   *       Secondary index rows are updated atomically with the base table rows.
+   *   <li>Partitioned DML does not guarantee exactly-once execution semantics against a partition.
+   *       The statement will be applied at least once to each partition. It is strongly recommended
+   *       that the DML statement should be idempotent to avoid unexpected results. For instance, it
+   *       is potentially dangerous to run a statement such as `UPDATE table SET column = column +
+   *       1` as it could be run multiple times against some rows.
+   *   <li>The partitions are committed automatically - there is no support for Commit or Rollback.
+   *       If the call returns an error, or if the client issuing the DML statement dies, it is
+   *       possible that some rows had the statement executed on them successfully. It is also
+   *       possible that statement was never executed against other rows.
+   *   <li>If any error is encountered during the execution of the partitioned DML operation (for
+   *       instance, a UNIQUE INDEX violation, division by zero, or a value that cannot be stored
+   *       due to schema constraints), then the operation is stopped at that point and an error is
+   *       returned. It is possible that at this point, some partitions have been committed (or even
+   *       committed multiple times), and other partitions have not been run at all.
+   * </ul>
+   *
+   * <p>Given the above, Partitioned DML is good fit for large, database-wide, operations that are
+   * idempotent, such as deleting old rows from a very large table.
+   */
+  long executePartitionedUpdate(Statement stmt);
 }
diff --git a/...ients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/DatabaseClientImpl.java b/...ients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/DatabaseClientImpl.java
@@ -28,10 +28,11 @@
 class DatabaseClientImpl implements DatabaseClient {
   private static final String READ_WRITE_TRANSACTION = "CloudSpanner.ReadWriteTransaction";
   private static final String READ_ONLY_TRANSACTION = "CloudSpanner.ReadOnlyTransaction";
+  private static final String PARTITION_DML_TRANSACTION = "CloudSpanner.PartitionDMLTransaction";
   private static final Tracer tracer = Tracing.getTracer();
 
   static {
-    TraceUtil.exportSpans(READ_WRITE_TRANSACTION, READ_ONLY_TRANSACTION);
+    TraceUtil.exportSpans(READ_WRITE_TRANSACTION, READ_ONLY_TRANSACTION, PARTITION_DML_TRANSACTION);
   }
 
   private final SessionPool pool;
@@ -155,6 +156,17 @@ public TransactionManager transactionManager() {
     }
   }
 
+  @Override
+  public long executePartitionedUpdate(Statement stmt) {
+    Span span = tracer.spanBuilder(PARTITION_DML_TRANSACTION).startSpan();
+    try (Scope s = tracer.withSpan(span)) {
+      return pool.getReadWriteSession().executePartitionedUpdate(stmt);
+    } catch (RuntimeException e) {
+      TraceUtil.endSpanWithFailure(span, e);
+      throw e;
+    }
+  }
+
   ListenableFuture<Void> closeAsync() {
     return pool.closeAsync();
   }

diff --git a/...-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/ResultSet.java b/...-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/ResultSet.java
@@ -16,7 +16,9 @@
 
 package com.google.cloud.spanner;
 
+import com.google.cloud.spanner.Options.QueryOption;
 import com.google.spanner.v1.ResultSetStats;
+import javax.annotation.Nullable;
 
 /**
  * Provides access to the data returned by a Cloud Spanner read or query. {@code ResultSet} allows a
@@ -59,13 +61,17 @@ public interface ResultSet extends AutoCloseable, StructReader {
   @Override
   void close();
 
+
   /**
    * Returns the {@link ResultSetStats} for the query only if the query was executed in either the
    * {@code PLAN} or the {@code PROFILE} mode via the {@link ReadContext#analyzeQuery(Statement,
-   * com.google.cloud.spanner.ReadContext.QueryAnalyzeMode)} method. Attempts to call this method on
-   * a {@code ResultSet} not obtained from {@code analyzeQuery} result in an {@code
-   * UnsupportedOperationException}. This method must be called after {@link #next()} has
-   * returned @{code false}. Calling it before that will result in an {@code IllegalStateException}.
+   * com.google.cloud.spanner.ReadContext.QueryAnalyzeMode)} method or for DML statements in
+   * {@link ReadContext#executeQuery(Statement, QueryOption...)}. Attempts to call this method on
+   * a {@code ResultSet} not obtained from {@code analyzeQuery} or {@code executeQuery} will return
+   * a {@code null} {@code ResultSetStats}. This method must be called after {@link #next()} has
+   * returned @{code false}. Calling it before that will result in {@code null}
+   * {@code ResultSetStats} too.
    */
+  @Nullable
   ResultSetStats getStats();
 }
diff --git a/...loud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java b/...loud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
@@ -312,6 +312,18 @@ public Timestamp write(Iterable<Mutation> mutations) throws SpannerException {
       }
     }
 
+    @Override
+    public long executePartitionedUpdate(Statement stmt) throws SpannerException {
+      try {
+        markUsed();
+        return delegate.executePartitionedUpdate(stmt);
+      } catch (SpannerException e) {
+        throw lastException = e;
+      } finally {
+        close();
+      }
+    }
+
     @Override
     public Timestamp writeAtLeastOnce(Iterable<Mutation> mutations) throws SpannerException {
       try {