Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix one potential root cause of deadlock in connection worker #1955

Merged
merged 71 commits into from
Jan 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
5a63d95
feat: Split writer into connection worker and wrapper, this is a
GaoleMeng Sep 9, 2022
5a13302
feat: add connection worker pool skeleton, used for multiplexing client
GaoleMeng Sep 13, 2022
0297204
Merge branch 'main' into main
GaoleMeng Sep 14, 2022
8a81ad3
feat: add Load api for connection worker for multiplexing client
GaoleMeng Sep 14, 2022
68fd040
Merge remote-tracking branch 'upstream/main'
GaoleMeng Sep 14, 2022
3106dae
Merge remote-tracking branch 'upstream/main'
GaoleMeng Sep 15, 2022
5bf04e5
Merge branch 'googleapis:main' into main
GaoleMeng Sep 15, 2022
2fc7551
Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage
GaoleMeng Sep 15, 2022
7a6d919
feat: add multiplexing support to connection worker. We will treat every
GaoleMeng Sep 15, 2022
3ba7659
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Sep 16, 2022
f379a78
Updates from OwlBot post-processor
gcf-owl-bot[bot] Sep 16, 2022
9307776
Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage
GaoleMeng Sep 16, 2022
de73013
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Sep 16, 2022
19005a1
feat: port the multiplexing client core algorithm and basic tests
GaoleMeng Sep 19, 2022
c5d14ba
Merge branch 'googleapis:main' into main
GaoleMeng Sep 19, 2022
644360a
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Sep 20, 2022
3099d82
Merge branch 'googleapis:main' into main
GaoleMeng Sep 20, 2022
e707dd6
Merge branch 'googleapis:main' into main
GaoleMeng Sep 20, 2022
9e7a8fa
Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage
GaoleMeng Sep 20, 2022
31f1755
Merge branch 'googleapis:main' into main
GaoleMeng Sep 20, 2022
44c36fc
feat: wire multiplexing connection pool to stream writer
GaoleMeng Sep 20, 2022
87a4036
feat: some fixes for multiplexing client
GaoleMeng Sep 23, 2022
c92ea1b
Merge remote-tracking branch 'upstream/main'
GaoleMeng Sep 23, 2022
019520c
Merge branch 'googleapis:main' into main
GaoleMeng Sep 26, 2022
47893df
feat: fix some todos, and reject the mixed behavior of passed in clie…
GaoleMeng Sep 27, 2022
8bd4e6a
Merge remote-tracking branch 'upstream/main'
GaoleMeng Sep 27, 2022
83409b0
Merge remote-tracking branch 'upstream/main'
GaoleMeng Sep 27, 2022
f7dd72d
Merge branch 'googleapis:main' into main
GaoleMeng Sep 27, 2022
a48399f
Merge branch 'googleapis:main' into main
GaoleMeng Sep 29, 2022
6789bc9
feat: fix the bug that we may peek into the write_stream field but it's
GaoleMeng Sep 29, 2022
46b4e6c
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Sep 29, 2022
dfd4dd9
Merge branch 'googleapis:main' into main
GaoleMeng Sep 29, 2022
d68ae70
feat: fix the bug that we may peek into the write_stream field but it's
GaoleMeng Sep 29, 2022
2983fe9
Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage
GaoleMeng Sep 29, 2022
d406256
Merge branch 'googleapis:main' into main
GaoleMeng Oct 13, 2022
22e9e07
feat: add getInflightWaitSeconds implementation
GaoleMeng Oct 13, 2022
fdb4e1c
Merge branch 'googleapis:main' into main
GaoleMeng Oct 21, 2022
0469474
Merge branch 'googleapis:main' into main
GaoleMeng Nov 2, 2022
d1b7740
feat: Add schema comparision in connection loop to ensure schema upda…
GaoleMeng Nov 3, 2022
e4cd529
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Nov 4, 2022
74ff1c4
Merge branch 'googleapis:main' into main
GaoleMeng Nov 4, 2022
762f49e
feat: add schema update support to multiplexing
GaoleMeng Nov 5, 2022
de456c2
Merge branch 'googleapis:main' into main
GaoleMeng Nov 11, 2022
c2f6edc
Merge branch 'googleapis:main' into main
GaoleMeng Nov 15, 2022
2487227
fix: fix windows build bug: windows Instant resolution is different with
GaoleMeng Nov 15, 2022
084d6d1
fix: fix another failing tests for windows build
GaoleMeng Nov 16, 2022
89c9701
Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage
GaoleMeng Nov 16, 2022
8441518
fix: fix another test failure for Windows build
GaoleMeng Nov 16, 2022
d249add
Merge branch 'googleapis:main' into main
GaoleMeng Nov 30, 2022
83aa7ff
feat: Change new thread for each retry to be a thread pool to avoid
GaoleMeng Nov 30, 2022
92a9c36
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Nov 30, 2022
a713a52
Merge branch 'googleapis:main' into main
GaoleMeng Nov 30, 2022
a042d5c
fix: add back the background executor provider that's accidentally
GaoleMeng Nov 30, 2022
53f4ec8
feat: throw error when use connection pool for explicit stream
GaoleMeng Dec 2, 2022
c494d8b
Merge branch 'googleapis:main' into main
GaoleMeng Dec 20, 2022
14b0c12
fix: Add precision truncation to the passed in value from JSON float and
GaoleMeng Jan 17, 2023
0da0e4b
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Jan 17, 2023
33d23ac
Merge branch 'googleapis:main' into main
GaoleMeng Jan 17, 2023
d2ee46e
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Jan 17, 2023
be6646e
modify the bom version
GaoleMeng Jan 17, 2023
62d8c41
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Jan 17, 2023
adf5f3f
fix deadlockissue in ConnectionWorkerPool
GaoleMeng Jan 18, 2023
c1970ff
Merge branch 'googleapis:main' into main
GaoleMeng Jan 18, 2023
3488df8
fix: fix deadlock issue during close + append for multiplexing
GaoleMeng Jan 19, 2023
6a512e8
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Jan 20, 2023
05edc2f
Merge branch 'googleapis:main' into main
GaoleMeng Jan 20, 2023
7d3da74
Merge branch 'googleapis:main' into main
GaoleMeng Jan 20, 2023
ecf6807
Merge branch 'googleapis:main' into main
GaoleMeng Jan 20, 2023
057dab9
Merge branch 'googleapis:main' into main
GaoleMeng Jan 23, 2023
5db46a2
fix: fix one potential root cause of deadlock issue for non-multiplexing
GaoleMeng Jan 23, 2023
32e9d33
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Jan 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,13 @@ implementation 'com.google.cloud:google-cloud-bigquerystorage'
If you are using Gradle without BOM, add this to your dependencies:

```Groovy
implementation 'com.google.cloud:google-cloud-bigquerystorage:2.28.2'
implementation 'com.google.cloud:google-cloud-bigquerystorage:2.28.3'
```

If you are using SBT, add this to your dependencies:

```Scala
libraryDependencies += "com.google.cloud" % "google-cloud-bigquerystorage" % "2.28.2"
libraryDependencies += "com.google.cloud" % "google-cloud-bigquerystorage" % "2.28.3"
```

## Authentication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
import java.util.Set;
import java.util.UUID;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.Condition;
Expand All @@ -63,6 +65,7 @@ public class ConnectionWorker implements AutoCloseable {
private Condition hasMessageInWaitingQueue;
private Condition inflightReduced;
private static Duration maxRetryDuration = Duration.ofMinutes(5);
private ExecutorService threadPool = Executors.newFixedThreadPool(1);

/*
* The identifier of the current stream to write to. This stream name can change during
Expand Down Expand Up @@ -288,7 +291,7 @@ private ApiFuture<AppendRowsResponse> appendInternal(AppendRowsRequest message)
requestWrapper.appendResult.setException(
new Exceptions.StreamWriterClosedException(
Status.fromCode(Status.Code.FAILED_PRECONDITION)
.withDescription("Connection is already closed"),
.withDescription("Connection is already closed during append"),
streamName,
writerId));
return requestWrapper.appendResult;
Expand Down Expand Up @@ -382,6 +385,18 @@ public void close() {
this.client.awaitTermination(150, TimeUnit.SECONDS);
} catch (InterruptedException ignored) {
}

try {
threadPool.shutdown();
threadPool.awaitTermination(3, TimeUnit.MINUTES);
} catch (InterruptedException e) {
// Unexpected. Just swallow the exception with logging.
log.warning(
"Close on thread pool for "
+ streamName
+ " is interrupted with exception: "
+ e.toString());
}
}

/*
Expand Down Expand Up @@ -639,35 +654,44 @@ private void requestCallback(AppendRowsResponse response) {
} finally {
this.lock.unlock();
}
if (response.hasError()) {
Exceptions.StorageException storageException =
Exceptions.toStorageException(response.getError(), null);
log.fine(String.format("Got error message: %s", response.toString()));
if (storageException != null) {
requestWrapper.appendResult.setException(storageException);
} else if (response.getRowErrorsCount() > 0) {
Map<Integer, String> rowIndexToErrorMessage = new HashMap<>();
for (int i = 0; i < response.getRowErrorsCount(); i++) {
RowError rowError = response.getRowErrors(i);
rowIndexToErrorMessage.put(Math.toIntExact(rowError.getIndex()), rowError.getMessage());
}
AppendSerializtionError exception =
new AppendSerializtionError(
response.getError().getCode(),
response.getError().getMessage(),
streamName,
rowIndexToErrorMessage);
requestWrapper.appendResult.setException(exception);
} else {
StatusRuntimeException exception =
new StatusRuntimeException(
Status.fromCodeValue(response.getError().getCode())
.withDescription(response.getError().getMessage()));
requestWrapper.appendResult.setException(exception);
}
} else {
requestWrapper.appendResult.set(response);
}

// We need a separte thread pool to unblock the next request callback.
// Otherwise user may call append inside request callback, which may be blocked on waiting
// on in flight quota, causing deadlock as requests can't be popped out of queue until
// the current request callback finishes.
threadPool.submit(
() -> {
if (response.hasError()) {
Exceptions.StorageException storageException =
Exceptions.toStorageException(response.getError(), null);
log.fine(String.format("Got error message: %s", response.toString()));
if (storageException != null) {
requestWrapper.appendResult.setException(storageException);
} else if (response.getRowErrorsCount() > 0) {
Map<Integer, String> rowIndexToErrorMessage = new HashMap<>();
for (int i = 0; i < response.getRowErrorsCount(); i++) {
RowError rowError = response.getRowErrors(i);
rowIndexToErrorMessage.put(
Math.toIntExact(rowError.getIndex()), rowError.getMessage());
}
AppendSerializtionError exception =
new AppendSerializtionError(
response.getError().getCode(),
response.getError().getMessage(),
streamName,
rowIndexToErrorMessage);
requestWrapper.appendResult.setException(exception);
} else {
StatusRuntimeException exception =
new StatusRuntimeException(
Status.fromCodeValue(response.getError().getCode())
.withDescription(response.getError().getMessage()));
requestWrapper.appendResult.setException(exception);
}
} else {
requestWrapper.appendResult.set(response);
}
});
}

private boolean isRetriableError(Throwable t) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
import static org.junit.Assert.assertThrows;
import static org.junit.Assert.assertTrue;

import com.google.api.client.util.Sleeper;
import com.google.api.core.ApiFuture;
import com.google.api.core.ApiFutureCallback;
import com.google.api.core.ApiFutures;
import com.google.api.gax.batching.FlowController;
import com.google.api.gax.core.NoCredentialsProvider;
import com.google.api.gax.grpc.testing.MockGrpcService;
Expand All @@ -34,6 +37,7 @@
import com.google.cloud.bigquery.storage.v1.StorageError.StorageErrorCode;
import com.google.cloud.bigquery.storage.v1.StreamWriter.SingleConnectionOrConnectionPool.Kind;
import com.google.common.base.Strings;
import com.google.common.util.concurrent.MoreExecutors;
import com.google.protobuf.Any;
import com.google.protobuf.DescriptorProtos;
import com.google.protobuf.Descriptors;
Expand Down Expand Up @@ -282,6 +286,64 @@ public void testAppendSuccess() throws Exception {
writer.close();
}

@Test
public void testAppendSuccess_RetryDirectlyInCallback() throws Exception {
// Set a relatively small in flight request counts.
StreamWriter writer =
StreamWriter.newBuilder(TEST_STREAM_1, client)
.setWriterSchema(createProtoSchema())
.setTraceId(TEST_TRACE_ID)
.setMaxRetryDuration(java.time.Duration.ofSeconds(5))
.setMaxInflightRequests(5)
.build();

// Fail the first request, in the request callback of the first request we will insert another
// 10 requests. Those requests can't be processed until the previous request callback has
// been finished.
long appendCount = 20;
for (int i = 0; i < appendCount; i++) {
if (i == 0) {
testBigQueryWrite.addResponse(
createAppendResponseWithError(Status.INVALID_ARGUMENT.getCode(), "test message"));
}
testBigQueryWrite.addResponse(createAppendResponse(i));
}

// We will trigger 10 more requests in the request callback of the following request.
ProtoRows protoRows = createProtoRows(new String[] {String.valueOf(-1)});
ApiFuture<AppendRowsResponse> future = writer.append(protoRows, -1);
ApiFutures.addCallback(
future, new AppendCompleteCallback(writer, protoRows), MoreExecutors.directExecutor());

StatusRuntimeException actualError =
assertFutureException(StatusRuntimeException.class, future);

Sleeper.DEFAULT.sleep(1000);
writer.close();
}

static class AppendCompleteCallback implements ApiFutureCallback<AppendRowsResponse> {

private final StreamWriter mainStreamWriter;
private final ProtoRows protoRows;
private int retryCount = 0;

public AppendCompleteCallback(StreamWriter mainStreamWriter, ProtoRows protoRows) {
this.mainStreamWriter = mainStreamWriter;
this.protoRows = protoRows;
}

public void onSuccess(AppendRowsResponse response) {
// Donothing
}

public void onFailure(Throwable throwable) {
for (int i = 0; i < 10; i++) {
this.mainStreamWriter.append(protoRows);
}
}
}

@Test
public void testUpdatedSchemaFetch_multiplexing() throws Exception {
testUpdatedSchemaFetch(/*enableMultiplexing=*/ true);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@
import io.grpc.Status.Code;
import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Phaser;
import javax.annotation.concurrent.GuardedBy;
import org.json.JSONArray;
Expand Down Expand Up @@ -188,16 +186,14 @@ static class AppendCompleteCallback implements ApiFutureCallback<AppendRowsRespo

private final DataWriter parent;
private final AppendContext appendContext;
// Prepare a thread pool
static ExecutorService pool = Executors.newFixedThreadPool(50);

public AppendCompleteCallback(DataWriter parent, AppendContext appendContext) {
this.parent = parent;
this.appendContext = appendContext;
}

public void onSuccess(AppendRowsResponse response) {
System.out.format("Append success%n");
System.out.format("Append success\n");
done();
}

Expand All @@ -209,22 +205,17 @@ public void onFailure(Throwable throwable) {
if (appendContext.retryCount < MAX_RETRY_COUNT
&& RETRIABLE_ERROR_CODES.contains(status.getCode())) {
appendContext.retryCount++;
// Use a separate thread to avoid potentially blocking while we are in a callback.
pool.submit(
() -> {
try {
// Since default stream appends are not ordered, we can simply retry the
// appends.
// Retrying with exclusive streams requires more careful consideration.
this.parent.append(appendContext);
} catch (Exception e) {
// Fall through to return error.
System.out.format("Failed to retry append: %s%n", e);
}
});
// Mark the existing attempt as done since it's being retried.
done();
return;
try {
// Since default stream appends are not ordered, we can simply retry the appends.
// Retrying with exclusive streams requires more careful consideration.
this.parent.append(appendContext);
// Mark the existing attempt as done since it's being retried.
done();
return;
} catch (Exception e) {
// Fall through to return error.
System.out.format("Failed to retry append: %s\n", e);
}
}

if (throwable instanceof AppendSerializtionError) {
Expand All @@ -241,21 +232,19 @@ public void onFailure(Throwable throwable) {
}
}

// Mark the existing attempt as done since we got a response for it
done();

// Retry the remaining valid rows, but using a separate thread to
// avoid potentially blocking while we are in a callback.
if (dataNew.length() > 0) {
pool.submit(
() -> {
try {
this.parent.append(new AppendContext(dataNew, 0));
} catch (Exception e2) {
System.out.format("Failed to retry append with filtered rows: %s%n", e2);
}
});
try {
this.parent.append(new AppendContext(dataNew, 0));
} catch (DescriptorValidationException e) {
throw new RuntimeException(e);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
// Mark the existing attempt as done since we got a response for it
done();
return;
}
}
Expand All @@ -267,7 +256,6 @@ public void onFailure(Throwable throwable) {
(storageException != null) ? storageException : new RuntimeException(throwable);
}
}
System.out.format("Error that arrived: %s%n", throwable);
done();
}

Expand Down