Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add paging to hbase client #4166

Merged
merged 87 commits into from
Apr 15, 2024
Merged

fix: Add paging to hbase client #4166

merged 87 commits into from
Apr 15, 2024

Conversation

ron-gal
Copy link
Contributor

@ron-gal ron-gal commented Sep 21, 2023

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> ☕️

If you write sample code, please follow the samples format.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigtable Issues related to the googleapis/java-bigtable-hbase API. labels Sep 21, 2023
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Sep 22, 2023
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Sep 22, 2023
@ron-gal ron-gal changed the title Added paging to hbase client fix: Add paging to hbase client Sep 25, 2023
@Override
public ResultScanner readRows(Query.QueryPaginator paginator, long maxSegmentByteSize) {
return new PaginatedRowResultScanner(
paginator, delegate, maxSegmentByteSize, () -> this.createScanCallContext());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you defer the creation of the ScanCallContext? The effect of re-creating it is that you are extending the operation timeout across all of the pages. I think the operation deadline should stay consistent from the callers perspective regardless if its broken up into multiple segments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -155,7 +170,8 @@ public void readRowsAsync(Query request, StreamObserver<Result> observer) {
.call(request, new StreamObserverAdapter<>(observer), createScanCallContext());
}

// Point reads are implemented using a streaming ReadRows RPC. So timeouts need to be managed
// Point reads are implemented using a streaming ReadRows RPC. So timeouts need
// to be managed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert irrelevant changes to minimize the noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return null;
}
}
scannerResultMeter.mark();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in the if statement?

Copy link
Contributor Author

@ron-gal ron-gal Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -74,6 +80,8 @@ public class TestDataClientVeneerApi {
private static final String TABLE_ID = "fake-table";
private static final ByteString ROW_KEY = ByteString.copyFromUtf8("row-key");

private static AtomicBoolean cancelled = new AtomicBoolean(false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy pasted from beam, removed

} else {
Query.QueryPaginator paginator =
hbaseAdapter.adapt(scan).createPaginator(scan.getCaching());
scanner = clientWrapper.readRows(paginator, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can pass in the default to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's private, and I wouldn't want to expose it. This value is just for tests, so I don't think it's critical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the default value in this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ResultScanner noRowsResultScanner = dataClientWrapper.readRows(query.createPaginator(100), -1);
assertNull(noRowsResultScanner.next());

verify(mockDataClient, times(2)).readRowsCallable(Mockito.<RowResultAdapter>any());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused at this test. Why would it be called twice when the page size is 100 and you're only returning 2 results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am calling readRows twice. Once with results, and once without

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see it now. However I don't think this actually tested anything. I think we can set a smaller page size (for example, 2), return more rows (say 3), and verify that readRowsCallable are called 3 times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

new Answer<Void>() {
@Override
public Void answer(InvocationOnMock invocation) throws Throwable {
((ResponseObserver) invocation.getArgument(1)).onResponse(Result.EMPTY_RESULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we return an empty result first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's similar to the test for the non paginated scanner

Copy link
Collaborator

@igorbernstein2 igorbernstein2 Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesnt really make sense. You want to be testing what happens between pages. Something like this:

private static Result createRow(String key) {
    return Result.create(
        ImmutableList.<Cell>of(
            new com.google.cloud.bigtable.hbase.adapters.read.RowCell(
                Bytes.toBytes(key),
                Bytes.toBytes("cf"),
                Bytes.toBytes("q"),
                10L,
                Bytes.toBytes("value"),
                ImmutableList.of("label"))));
  }
  @Test
  public void testReadPaginatedRows() throws IOException {
    Query query = Query.create(TABLE_ID).range("a", "z");
    when(mockDataClient.readRowsCallable(Mockito.<RowResultAdapter>any()))
        .thenReturn(mockStreamingCallable);

    // First Page
    doAnswer((args) -> {
      ResponseObserver<Result> observer = args.getArgument(1);
      observer.onResponse(createRow("a"));
      observer.onResponse(createRow("b"));
      observer.onComplete();
      return null;
    })
        .when(mockStreamingCallable)
        .call(eq(Query.create(TABLE_ID).range("a", "z").limit(2)), any(), any());

    // 2nd Page
    doAnswer((args) -> {
      ResponseObserver<Result> observer = args.getArgument(1);
      observer.onResponse(createRow("c"));
      observer.onResponse(createRow("d"));
      observer.onComplete();
      return null;
    })
        .when(mockStreamingCallable)
        .call(
            eq(
              Query.create(TABLE_ID)
                .range(ByteStringRange.unbounded().startOpen("b").endOpen("z")).limit(2)),
            any(), any());

    // 3rd Page
    doAnswer((args) -> {
      ResponseObserver<Result> observer = args.getArgument(1);
      observer.onResponse(createRow("e"));
      observer.onComplete();
      return null;
    })
        .when(mockStreamingCallable)
        .call(
        eq(
          Query.create(TABLE_ID)
            .range(ByteStringRange.unbounded().startOpen("d").endOpen("z")).limit(2)), any(), any());

    // 3rd Page
    doAnswer((args) -> {
      ResponseObserver<Result> observer = args.getArgument(1);
      observer.onComplete();
      return null;
    })
        .when(mockStreamingCallable)
        .call(
            eq(
                Query.create(TABLE_ID)
                    .range(ByteStringRange.unbounded().startOpen("e").endOpen("z")).limit(2)), any(), any());

    ResultScanner resultScanner = dataClientWrapper.readRows(query.createPaginator(2), 1000);

    assertThat(resultScanner)
        .comparingElementsUsing(Correspondence.transforming((Result r) -> new String(r.getRow()), "row key"))
        .containsExactly("a", "b", "c", "d", "e");
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Mockito.any(ResponseObserver.class),
Mockito.any(GrpcCallContext.class));

ResultScanner resultScanner = dataClientWrapper.readRows(query.createPaginator(100), 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here, why do we return an empty result here? I don't think that's gonna use any memory? Maybe you want to return another expected result and assert readRowsCallable is called twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just copying the old test. No real reason, but I don't think it's not valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I don't think this actually tested the memory buffer function. We should at least return another result, and verify that readRowsCallable are called twice (first time the request got cancelled becuase we filled up the buffer, and the second time we finish the read).


doAnswer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between line 353 - 372 and line 331 - 351? Seem to be duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

@Test
public void testRead100Rows() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a new test that's added to test the paginator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Adjusted to new paginator impl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isnt really testing anything, I would just drop it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

testManyResultsInScanner(105);
}

private void testManyResultsInScanner(int rowsToWrite) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should be more clear on my previous comment. I think we want to test both getScanner without pagination and getScanner with pagiation. So maybe we can add a parameter:

testManyResultsInScanner(int rowsToWrite, boolean with pagination);

And add one more test: testManyResultsInScanner(100, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
Query.QueryPaginator paginator =
hbaseAdapter.adapt(scan).createPaginator(scan.getCaching());
scanner = clientWrapper.readRows(paginator, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the default value in this class?

ResultScanner noRowsResultScanner = dataClientWrapper.readRows(query.createPaginator(100), -1);
assertNull(noRowsResultScanner.next());

verify(mockDataClient, times(2)).readRowsCallable(Mockito.<RowResultAdapter>any());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see it now. However I don't think this actually tested anything. I think we can set a smaller page size (for example, 2), return more rows (say 3), and verify that readRowsCallable are called 3 times.

Mockito.any(ResponseObserver.class),
Mockito.any(GrpcCallContext.class));

ResultScanner resultScanner = dataClientWrapper.readRows(query.createPaginator(100), 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I don't think this actually tested the memory buffer function. We should at least return another result, and verify that readRowsCallable are called twice (first time the request got cancelled becuase we filled up the buffer, and the second time we finish the read).

public void testReadRows_Errors() throws IOException {
Query query = Query.create(TABLE_ID).rowKey(ROW_KEY);
when(mockDataClient.readRowsCallable(Mockito.<RowResultAdapter>any()))
.thenThrow(new RuntimeException())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnt make sense. readRowsCallable() will never throw an exception. You want to test what happens when the server returns an error. I think something like this:

  public void testReadRows_Errors() throws IOException {
    Query query = Query.create(TABLE_ID).rowKey(ROW_KEY);
    when(mockDataClient.readRowsCallable(any(RowResultAdapter.class)))
        .thenReturn(mockStreamingCallable);
    when(mockStreamingCallable.call(any(Query.class), any(GrpcCallContext.class)))
        .thenReturn(serverStream);
    when(serverStream.iterator())
        .thenReturn(new Iterator<Result>() {
          @Override
          public boolean hasNext() {
            return true;
          }
          @Override
          public Result next() {
            throw new InternalException("fake error", null, GrpcStatusCode.of(Code.INTERNAL), false);
          }
        })
        .thenReturn(ImmutableList.<Result>of().iterator());

    assertThrows(Exception.class, () -> dataClientWrapper.readRows(query).next());

    ResultScanner noRowsResultScanner = dataClientWrapper.readRows(query);
    assertNull(noRowsResultScanner.next());
    noRowsResultScanner.close();

    verify(mockDataClient, times(2)).readRowsCallable(Mockito.<RowResultAdapter>any());
    verify(serverStream, times(2)).iterator();
    verify(mockStreamingCallable, times(2))
        .call(any(Query.class), any(GrpcCallContext.class));
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@Test
public void testReadRowsLowMemory() throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, you want to be testing the rpc between pages. so something like this:

 public void testReadRowsLowMemory() throws IOException {
    Query query = Query.create(TABLE_ID);
    when(mockDataClient.readRowsCallable(any(RowResultAdapter.class)))
        .thenReturn(mockStreamingCallable);

    StreamController mockController = Mockito.mock(StreamController.class);
    doAnswer(invocation -> {
          cancelled.set(true);
          return null;
        })
        .when(mockController)
        .cancel();


    // Generate
    doAnswer(
        (Answer<Void>) invocation -> {
          ResponseObserver<Result> observer = invocation.getArgument(1);
          observer.onStart(mockController);

          for(int i=0; i < 1000 && !cancelled.get(); i++) {
            observer.onResponse(createRow(String.format("row%010d", i)));
            Thread.sleep(10);
          }
          observer.onComplete();
          return null;
        })
        .doAnswer(
        (Answer<Void>) invocation -> {
          ResponseObserver<Result> observer = invocation.getArgument(1);
          observer.onComplete();
          return null;
        })
        .when(mockStreamingCallable)
        .call(any(), any(), any());

    ResultScanner resultScanner = dataClientWrapper.readRows(query.createPaginator(100), 3);
    // Consume the stream
    Lists.newArrayList(resultScanner);

    verify(mockStreamingCallable, times(2))
        .call(
            any(Query.class),
            any(ResponseObserver.class),
            any(GrpcCallContext.class));
    assertTrue(cancelled.get());
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mutianf mutianf added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 15, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 15, 2024
@mutianf mutianf merged commit 33facf5 into googleapis:main Apr 15, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/java-bigtable-hbase API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants