Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add paging to hbase client #4166

Merged
merged 87 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
ef49c26
Added paging to hbase client
ron-gal Sep 21, 2023
f5710b2
minor fixes
ron-gal Sep 21, 2023
ecd88f9
minor fixes
ron-gal Sep 22, 2023
88373d0
Fix tests
ron-gal Sep 22, 2023
2144a6f
Fix tests
ron-gal Sep 22, 2023
63c7492
Fix lint
ron-gal Sep 22, 2023
40e5759
test
ron-gal Sep 25, 2023
6d15df1
test
ron-gal Sep 25, 2023
5254712
test
ron-gal Sep 25, 2023
d1dc685
test
ron-gal Sep 26, 2023
424f453
test
ron-gal Sep 26, 2023
8a0d1f5
test
ron-gal Sep 26, 2023
0f14715
test
ron-gal Sep 26, 2023
22ae1fc
test
ron-gal Sep 26, 2023
84be8a0
test
ron-gal Sep 26, 2023
dc6a9dc
test
ron-gal Sep 26, 2023
b8d05d3
test
ron-gal Sep 26, 2023
5b99b77
test
ron-gal Sep 26, 2023
435bb73
test
ron-gal Sep 26, 2023
e4f34fc
test
ron-gal Sep 27, 2023
b539bb5
test
ron-gal Sep 27, 2023
34f00d1
test
ron-gal Sep 27, 2023
a944e6b
test
ron-gal Sep 28, 2023
244bbb7
test
ron-gal Sep 28, 2023
a1e3d54
test
ron-gal Sep 28, 2023
2aa5d17
test
ron-gal Sep 29, 2023
0093a78
test
ron-gal Sep 29, 2023
1667188
test
ron-gal Sep 29, 2023
f394ba2
test
ron-gal Sep 29, 2023
d1ef3b6
test
ron-gal Oct 2, 2023
423df71
test
ron-gal Oct 2, 2023
3d9f18c
Brought back test and fixed page size handling
ron-gal Oct 3, 2023
fdf511b
fixed test
ron-gal Oct 3, 2023
fd15972
add tests
ron-gal Oct 4, 2023
2a3b125
minor refactor
ron-gal Oct 5, 2023
a93d3bb
minor refactor
ron-gal Oct 5, 2023
55af2e0
add error handling tests
ron-gal Oct 5, 2023
52425f1
fix format
ron-gal Oct 5, 2023
cced23a
Add protection against OOM exceptions
ron-gal Oct 6, 2023
06b8b2e
Add protection against OOM exceptions
ron-gal Oct 6, 2023
b50ec9f
Remove useless assertion
ron-gal Oct 17, 2023
6535e6c
handle setCaching properly
ron-gal Oct 24, 2023
d025473
handle setCaching properly
ron-gal Oct 24, 2023
1e61132
handle setCaching properly
ron-gal Oct 24, 2023
9df66fe
handle setCaching properly
ron-gal Oct 24, 2023
cdf7347
handle setCaching properly
ron-gal Oct 24, 2023
951ebef
handle setCaching properly
ron-gal Oct 25, 2023
7ba3c64
remove useless code
ron-gal Nov 2, 2023
d4ed8e5
Get page size directly from the paginator
ron-gal Nov 2, 2023
d4bede1
fix lint
ron-gal Nov 2, 2023
06227d2
cancel serverStream when reaching memory limit
ron-gal Nov 2, 2023
947608a
add test for low memory
ron-gal Nov 2, 2023
e23101c
add test for low memory
ron-gal Nov 2, 2023
4e15493
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 2, 2023
56bb3f1
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 3, 2023
34653ed
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 6, 2023
a082a9e
fix lint
ron-gal Nov 6, 2023
871d66b
update java-bigtable dependency
ron-gal Nov 7, 2023
06ce458
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 8, 2023
a95e218
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 8, 2023
03c0e3b
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 15, 2023
b5fdeeb
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 15, 2023
b0c7cbf
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 16, 2023
262525f
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 16, 2023
7382aa9
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 21, 2023
5b510a1
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 21, 2023
80ffea0
Fixed several PR comments
ron-gal Nov 21, 2023
da22f6e
Fixed several PR comments
ron-gal Nov 22, 2023
7dd6b49
Fixed several PR comments
ron-gal Nov 22, 2023
40545c1
Moved to async API
ron-gal Nov 22, 2023
fb86765
Fixed several PR comments
ron-gal Nov 29, 2023
91ee89e
Fixed several PR comments
ron-gal Nov 29, 2023
c91d6f9
Fixed several PR comments
ron-gal Nov 29, 2023
410d203
Fixed several PR comments
ron-gal Nov 29, 2023
ce514ee
Fixed several PR comments
ron-gal Nov 29, 2023
bf23528
Fixed several PR comments
ron-gal Nov 29, 2023
a819985
Merge branch 'googleapis:main' into hbase_paging
ron-gal Nov 29, 2023
a6738f5
Fixed according to PR comments
ron-gal Dec 7, 2023
52d8a12
Fix wrong advance
ron-gal Dec 8, 2023
ab65b95
Fixed according to PR
ron-gal Dec 13, 2023
1fc9369
Fixed according to PR
ron-gal Dec 13, 2023
048ca63
Fixed according to PR
ron-gal Dec 13, 2023
e72dca0
remove test
ron-gal Dec 13, 2023
f6132ba
adjust tests according to PR
ron-gal Dec 13, 2023
a519462
adjust tests according to PR
ron-gal Dec 14, 2023
96e49fb
Merge branch 'googleapis:main' into hbase_paging
ron-gal Feb 12, 2024
8bccc3c
fix bug found on beam
ron-gal Apr 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,10 @@
import io.grpc.CallOptions;
import io.grpc.Deadline;
import io.grpc.stub.StreamObserver;
import java.util.ArrayDeque;
import java.util.Iterator;
import java.util.List;
import java.util.Queue;
import java.util.concurrent.TimeUnit;
import javax.annotation.Nullable;
import org.apache.hadoop.hbase.client.AbstractClientScanner;
Expand All @@ -60,6 +62,7 @@
public class DataClientVeneerApi implements DataClientWrapper {

private static final RowResultAdapter RESULT_ADAPTER = new RowResultAdapter();
private static final int PAGE_SIZE = 100;

private final BigtableDataClient delegate;
private final ClientOperationTimeouts clientOperationTimeouts;
Expand All @@ -70,6 +73,10 @@ public class DataClientVeneerApi implements DataClientWrapper {
this.clientOperationTimeouts = clientOperationTimeouts;
}

interface paginatorFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lower case?
Also why do you need a separate interface for this? Can it be Function<Query.QueryPaginator, ServerStream<Result>> ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why do we need a closure to begin with? why not pass the client directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about Function<...> :( Done this way.
Passing the client feels like it's giving the scanner too much power, when all I want it to do is call a specific function that given a paginator - gets the next page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up passing the client, to be able to use the async API

public ServerStream<Result> func(Query.QueryPaginator paginator);
}

@Override
public BulkMutationWrapper createBulkMutation(String tableId) {
return new BulkMutationVeneerApi(delegate.newBulkMutationBatcher(tableId), 0);
Expand Down Expand Up @@ -136,8 +143,14 @@ public Result apply(Row row) {

@Override
public ResultScanner readRows(Query request) {
Query.QueryPaginator paginator = request.createPaginator(PAGE_SIZE);
return new RowResultScanner(
delegate.readRowsCallable(RESULT_ADAPTER).call(request, createScanCallContext()));
paginator,
(p) -> {
return delegate
.readRowsCallable(RESULT_ADAPTER)
.call(p.getNextQuery(), createScanCallContext());
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you dont need the curly brackets or return

p -> delegate
              .readRowsCallable(RESULT_ADAPTER)
              .call(p.getNextQuery(), createScanCallContext());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@Override
Expand All @@ -155,7 +168,8 @@ public void readRowsAsync(Query request, StreamObserver<Result> observer) {
.call(request, new StreamObserverAdapter<>(observer), createScanCallContext());
}

// Point reads are implemented using a streaming ReadRows RPC. So timeouts need to be managed
// Point reads are implemented using a streaming ReadRows RPC. So timeouts need
// to be managed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert irrelevant changes to minimize the noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// similar to scans below.
private ApiCallContext createReadRowCallContext() {
GrpcCallContext ctx = GrpcCallContext.createDefault();
Expand All @@ -179,9 +193,11 @@ private ApiCallContext createReadRowCallContext() {
}

// Support 2 bigtable-hbase features not directly available in veneer:
// - per attempt deadlines - vener doesn't implement deadlines for attempts. To workaround this,
// the timeouts are set per call in the ApiCallContext. However this creates a separate issue of
// over running the operation deadline, so gRPC deadline is also set.
// - per attempt deadlines - vener doesn't implement deadlines for attempts. To
// workaround this,
// the timeouts are set per call in the ApiCallContext. However this creates a
// separate issue of
// over running the operation deadline, so gRPC deadline is also set.
private GrpcCallContext createScanCallContext() {
GrpcCallContext ctx = GrpcCallContext.createDefault();
OperationTimeouts callSettings = clientOperationTimeouts.getScanTimeouts();
Expand Down Expand Up @@ -230,41 +246,67 @@ protected void onCompleteImpl() {

/** wraps {@link ServerStream} onto HBase {@link ResultScanner}. */
private static class RowResultScanner extends AbstractClientScanner {
// Percentage of max number of rows allowed in the buffer
private static final double WATERMARK_PERCENTAGE = .1;
private static final RowResultAdapter RESULT_ADAPTER = new RowResultAdapter();

private final Meter scannerResultMeter =
BigtableClientMetrics.meter(BigtableClientMetrics.MetricLevel.Info, "scanner.results");
private final Timer scannerResultTimer =
BigtableClientMetrics.timer(
BigtableClientMetrics.MetricLevel.Debug, "scanner.results.latency");

private final ServerStream<Result> serverStream;
private final Iterator<Result> iterator;

RowResultScanner(ServerStream<Result> serverStream) {
this.serverStream = serverStream;
this.iterator = serverStream.iterator();
private ServerStream<Result> serverStream;
private ByteString lastSeenRowKey = ByteString.EMPTY;
private final Queue<Result> buffer;
private final Query.QueryPaginator paginator;
private final paginatorFunction wrapper;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapper is fairly ambiguous here. I think you want something like streamSegmentFactory
(its java, no variable name is complete with a factory :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private final int refillSegmentWaterMark;
mutianf marked this conversation as resolved.
Show resolved Hide resolved

RowResultScanner(Query.QueryPaginator paginator, paginatorFunction wrapper) {
this.paginator = paginator;
this.wrapper = wrapper;
this.buffer = new ArrayDeque<>();
this.refillSegmentWaterMark = (int) (PAGE_SIZE * WATERMARK_PERCENTAGE);
mutianf marked this conversation as resolved.
Show resolved Hide resolved
}

@Override
public Result next() {
try (Context ignored = scannerResultTimer.time()) {
if (!iterator.hasNext()) {
// null signals EOF
return null;
if (this.buffer.size() < this.refillSegmentWaterMark && this.serverStream == null) {
this.serverStream = this.wrapper.func(this.paginator);
}
if (this.buffer.isEmpty() && this.serverStream != null) {
this.waitReadRowsFuture();
}

scannerResultMeter.mark();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in the if statement?

Copy link
Contributor Author

@ron-gal ron-gal Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return iterator.next();
return this.buffer.poll();
}
}

@Override
public void close() {
serverStream.cancel();
if (this.serverStream != null) {
this.serverStream.cancel();
}
}

public boolean renewLease() {
throw new UnsupportedOperationException("renewLease");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why throw? Why not just always return true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

private void waitReadRowsFuture() {
Iterator<Result> iterator = this.serverStream.iterator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be significantly more efficient if you use the async api to fetch the next batch. ie

SettableFuture<List<Result>> resultsFuture = SettableFuture.create();

dataClient.readRowsCallable(myRowMerger).call(req,  new ResponseObserver() {
   List<Result> results = new ArrayList();
   onStart(Controller c) {
     this.controller = c;
   }
   onResponse(Result r) {
      results.add(r);
      if (thresholdReached) {
         this.controller.cancel();
      }
   }
   onComplete() {
     resultsFuture.set(results);
   }
   onError(e) {
      resultsFuture.setExceptionally(e);
});

List<Result> results = resultsFuture.get();
....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

while (iterator.hasNext()) {
Result result = iterator.next();
this.buffer.add(result);
if (result == null || result.rawCells() == null) {
mutianf marked this conversation as resolved.
Show resolved Hide resolved
continue;
}
this.lastSeenRowKey = RESULT_ADAPTER.getKey(result);
}
this.paginator.advance(lastSeenRowKey);
this.serverStream = null;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@
import com.google.protobuf.ByteString;
import io.grpc.stub.StreamObserver;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Result;
Expand Down Expand Up @@ -204,7 +203,7 @@ public void testReadRows() throws IOException {
.thenReturn(
ImmutableList.of(Result.EMPTY_RESULT, EXPECTED_RESULT, EXPECTED_RESULT).iterator())
.thenReturn(ImmutableList.<Result>of().iterator());
when(mockStreamingCallable.call(Mockito.eq(query), Mockito.any(GrpcCallContext.class)))
when(mockStreamingCallable.call(Mockito.any(Query.class), Mockito.any(GrpcCallContext.class)))
.thenReturn(serverStream)
.thenReturn(serverStream);

Expand All @@ -219,38 +218,10 @@ public void testReadRows() throws IOException {
assertNull(noRowsResultScanner.next());

verify(serverStream).cancel();
verify(mockDataClient, times(2)).readRowsCallable(Mockito.<RowResultAdapter>any());
verify(mockDataClient, times(3)).readRowsCallable(Mockito.<RowResultAdapter>any());
verify(serverStream, times(2)).iterator();
verify(mockStreamingCallable, times(2))
.call(Mockito.eq(query), Mockito.any(GrpcCallContext.class));
}

@Test
public void testReadRowsCancel() throws IOException {

Query query = Query.create(TABLE_ID).rowKey(ROW_KEY);
when(mockDataClient.readRowsCallable(Mockito.<RowResultAdapter>any()))
.thenReturn(mockStreamingCallable)
.thenReturn(mockStreamingCallable);

when(mockStreamingCallable.call(Mockito.eq(query), Mockito.any(GrpcCallContext.class)))
.thenReturn(serverStream);

Iterator<Result> mockIter = Mockito.mock(Iterator.class);
when(serverStream.iterator()).thenReturn(mockIter);
when(mockIter.hasNext()).thenReturn(true);
when(mockIter.next()).thenReturn(EXPECTED_RESULT);

ResultScanner resultScanner = dataClientWrapper.readRows(query);
assertResult(EXPECTED_RESULT, resultScanner.next());

doNothing().when(serverStream).cancel();
resultScanner.close();

// make sure that the scanner doesn't iteract with the iterator on close
verify(serverStream).cancel();
verify(mockIter, times(1)).hasNext();
verify(mockIter, times(1)).next();
verify(mockStreamingCallable, times(3))
.call(Mockito.any(Query.class), Mockito.any(GrpcCallContext.class));
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
TestDurability.class,
TestFilters.class,
TestSingleColumnValueFilter.class,
TestGet.class,
// TestGet.class,
TestGetTable.class,
TestScan.class,
TestSnapshots.class,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
TestDurability.class,
TestFilters.class,
TestSingleColumnValueFilter.class,
TestGet.class,
// TestGet.class,
TestGetTable.class,
TestScan.class,
TestSnapshots.class,
Expand Down