Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-26545 Implement tracing of scan #4106

Merged
merged 1 commit into from
Mar 28, 2022

Conversation

ndimiduk
Copy link
Member

  • on AsyncTable, both scan and scanAll methods should result in SCAN table operations.
  • the span of the SCAN table operation should have children representing all the RPC calls
    involved in servicing the scan.
  • when a user provides custom implementation of AdvancedScanResultConsumer, any spans emitted
    from the callback methods should also be tied to the span that represents the SCAN table
    operation. This is easily done because these callbacks are executed on the RPC thread.
  • when a user provides a custom implementation of ScanResultConsumer, any spans emitted from the
    callback methods are tied to the context of user application code. We could link them to the
    SCAN table operation, except that we have no means of passing that span along.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache9
Copy link
Contributor

Apache9 commented Feb 12, 2022

So for async table, we will create a span when starting the scanner, and once we actually execute a piece of code, we will restore that span? So when will we end the span?

@ndimiduk
Copy link
Member Author

So for async table, we will create a span when starting the scanner, and once we actually execute a piece of code, we will restore that span? So when will we end the span?

For async table, we create the span in AsyncClientScanner#start. That span wraps the runnable that starts the async callback chain that handles populating scanner results behind the scenes for the client application. Each callback handles making sure that span is restored to its current context. In the test, there are asserts that confirm this span has ended and with StatusCode.OK. Perhaps you are correct that this implementation does not handle error cases. I don't see test code that covers an error while scanning. While the test code asserts the parent-child relationships of spans, it does not cover explicitly the start/end times of those children as relates to their parent. I will look closer here.

@ndimiduk ndimiduk marked this pull request as draft February 17, 2022 14:24
@ndimiduk
Copy link
Member Author

I've pushed an update. Rebased onto master. Reworked how the span is threaded through AsyncClientScanner based on @Apache9 's comments. Likewise, thread the span through the AsyncTableResultScanner for the same purpose. I also enabled span lifetime auditing for unit tests.

Marked the PR as a draft because I think there's more work to do -- at the very least, I plan to introduce test coverage via new, dedicated test class(es) instead of piggy-backing on the existing scan tests.

Duo -- I look forward to your further thoughts/observations.

@@ -1884,12 +1884,13 @@
-Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true
-Djava.awt.headless=true -Djdk.net.URLClassPath.disableClassPathURLCheck=true
-Dorg.apache.hbase.thirdparty.io.netty.leakDetection.level=advanced
-Dio.netty.eventLoopThreads=3
-Dio.netty.eventLoopThreads=3 -Dio.opentelemetry.context.enableStrictContext=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch 3 times, most recently from 3b4b77f to 46d0c9d Compare February 21, 2022 16:28
@ndimiduk
Copy link
Member Author

I broke off HBASE-26764 and HBASE-26765 as #4120 and #4121 respectively.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch from 4cc2aa4 to 667c6d3 Compare March 7, 2022 13:51
@Apache-HBase

This comment was marked as outdated.

@ndimiduk
Copy link
Member Author

Ping @joshelser @taklwu @Apache9

@ndimiduk ndimiduk requested review from apurtell and busbey March 11, 2022 14:11
Copy link
Contributor

@apurtell apurtell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test changes that assert continuity are fairly extensive. I started skimming there after realizing the pattern of the refactor.

.operationTimeout(scanTimeoutNs, TimeUnit.NANOSECONDS).pause(pauseNs, TimeUnit.NANOSECONDS)
.pauseForCQTBE(pauseForCQTBENs, TimeUnit.NANOSECONDS).maxAttempts(maxAttempts)
.startLogErrorsCnt(startLogErrorsCnt).action(this::callOpenScanner).call();
try (Scope ignored = span.makeCurrent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to also maintain your volatile usage discipline here and copy 'span' to a 'localSpan' first, as you have done at other call sites?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do so here because the member field is accessed only once in the method lifetime, while the other call sites made multiple references to the field. I can use the same pattern here, sure, it won't hurt anything.

break;
Span span = null;
try (AsyncTableResultScanner scanner = rawTable.getScanner(scan)) {
span = scanner.getSpan();
Copy link
Contributor

@apurtell apurtell Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be null? I guess not as long as someone is aware of getter/setter for span in AsyncTableResultScanner and the expected convention. I suppose the resulting NPE would clear enough if not.
And ditto other call sites. I might have thrown something with an explicit message about failure to maintain the code discipline but it seems fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier version of this patch, I hade null-protection sprinkled throughout this method. IntelliJ static analysis annotated it all as unnecessary. If you're also wondering... I'll trust two devs over the tool. Let me bring it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to throw because a disruption in tracing should not hinder application execution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm simplifying this according to the hints provided by the IDE. I cannot make the field final because the compiler errors in the catch block that the field hasn't been initialized, but static analysis agrees with @Apache9 that the field can never be null.

@@ -452,6 +456,53 @@ public void testScanAll() {
assertTrace("SCAN");
}

@Test
public void testScan() throws Throwable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: testScanTracing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class has "Tracing" in the name, so I didn't think it necessary.

}

@Test
public void testGetScanner() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: testGetScannerTracing?

protected static final ConnectionRule connectionRule =
new ConnectionRule(miniClusterRule::createConnection);

private static final class Setup extends ExternalResource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was wondering about this but I see it is just a junit convention refactor

Copy link
Member Author

@ndimiduk ndimiduk Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just convention. JUnit makes no promises re: execution order of @Before and @Rule, of @BeforeClass and @ClassRule. I want to use fields that are initialized by the @ClassRule annotation, so I need to participate in that annotation. The ChainRule allows me to explicitly order the execution, so that otel is initialize first, then the miniCluster, then the connection to the miniCluster, and finally my code that makes use of the previous.

I much much much prefer the Rules over test class inheritance because mixing in functionality is so much easier to reason about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can upgrading to junit5 solve this problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the RuleChain stuff as a "problem", it's just the way this API works. I believe that JUnit 5 is a complete overhaul of the testing harness, I haven't looked into what all is involved, but I think it is involved.

@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch from e00d40c to 46d6e8f Compare March 14, 2022 10:01
});
}

public void start() {
openScanner();
final Span localSpan = new TableOperationSpanBuilder(conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place where we create this scanner and call start on it is in RawAsyncTableImpl.scan method, where both these two operations are in the same thread, so I think here we could make the Span field final and create it in the constructor, so we do not need to care about concurrency access any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, I suppose so.

@@ -573,7 +574,8 @@ private void call() {
resetController(controller, callTimeoutNs, priority);
ScanRequest req = RequestConverter.buildScanRequest(scannerId, scan.getCaching(), false,
nextCallSeq, scan.isScanMetricsEnabled(), false, scan.getLimit());
stub.scan(controller, req, resp -> onComplete(controller, resp));
Context context = Context.current();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose here? Wrap the callback in the span in the current context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to wrap the onComplete callback invocation. The line following would be less awkward if Context offered a method <T> Consumer<T> wrap(Consumer<T> consumer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this API has been added upstream, open-telemetry/opentelemetry-java@cffbd32

@Apache9
Copy link
Contributor

Apache9 commented Mar 14, 2022

Haven't finished the review yet. Will take a look on the second half tonight.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big concerns on the second half, most changes are tests, which I think is good.

protected static final ConnectionRule connectionRule =
new ConnectionRule(miniClusterRule::createConnection);

private static final class Setup extends ExternalResource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can upgrading to junit5 solve this problem?

@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch from 46d6e8f to 2b7bc6d Compare March 21, 2022 16:09
@ndimiduk
Copy link
Member Author

Rebased onto master and I believe I've addressed all outstanding reviewer feedback.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@ndimiduk
Copy link
Member Author

Awkwardly, the tests of interest have the xml read error. They pass locally. Kicking the PR build again.

@Apache9 @apurtell do you have any further comments?

@Apache-HBase

This comment was marked as outdated.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. The only thing is still about when to end the span. I think since we have a ScanResultConsumer, we could just follow the old logic, to add span.end at the place where we call consumer.onComplate and consumer.onError.

@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch from 2b7bc6d to 92fbdc0 Compare March 28, 2022 08:01
@Apache9
Copy link
Contributor

Apache9 commented Mar 28, 2022

Approved.

I think this is last piece of tracing? Thanks @ndimiduk for the hard!

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 11s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for branch
+1 💚 mvninstall 2m 18s master passed
+1 💚 compile 6m 14s master passed
+1 💚 checkstyle 1m 9s master passed
+1 💚 spotbugs 9m 10s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 2m 15s the patch passed
+1 💚 compile 6m 12s the patch passed
+1 💚 javac 6m 12s the patch passed
-0 ⚠️ checkstyle 1m 9s root: The patch generated 1 new + 9 unchanged - 2 fixed = 10 total (was 11)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 hadoopcheck 11m 46s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotbugs 9m 38s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
57m 34s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4106
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile xml
uname Linux ae3354419463 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4f491fd
Default Java AdoptOpenJDK-1.8.0_282-b08
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/artifact/yetus-general-check/output/diff-checkstyle-root.txt
Max. process+thread count 138 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

* on `AsyncTable`, both `scan` and `scanAll` methods should result in `SCAN` table operations.
* the span of the `SCAN` table operation should have children representing all the RPC calls
  involved in servicing the scan.
* when a user provides custom implementation of `AdvancedScanResultConsumer`, any spans emitted
  from the callback methods should also be tied to the span that represents the `SCAN` table
  operation. This is easily done because these callbacks are executed on the RPC thread.
* when a user provides a custom implementation of `ScanResultConsumer`, any spans emitted from the
  callback methods should be also be tied to the span that represents the `SCAN` table
  operation. This accomplished by carefully passing the span instance around after it is created.

Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Duo Zhang <[email protected]>
@ndimiduk ndimiduk force-pushed the 26545-trace-scans branch from 92fbdc0 to 2c439d2 Compare March 28, 2022 10:46
@ndimiduk ndimiduk merged commit 620e5c6 into apache:master Mar 28, 2022
@ndimiduk ndimiduk deleted the 26545-trace-scans branch March 28, 2022 10:47
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 43s Maven dependency ordering for branch
+1 💚 mvninstall 2m 56s master passed
+1 💚 compile 1m 47s master passed
+1 💚 shadedjars 4m 4s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 29s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 2m 39s the patch passed
+1 💚 compile 1m 51s the patch passed
+1 💚 javac 1m 51s the patch passed
+1 💚 shadedjars 4m 2s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 28s the patch passed
_ Other Tests _
+1 💚 unit 237m 26s root in the patch passed.
265m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4106
Optional Tests javac javadoc unit shadedjars compile
uname Linux e1c57335233d 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4f491fd
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/testReport/
Max. process+thread count 4684 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 13s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for branch
+1 💚 mvninstall 3m 54s master passed
+1 💚 compile 1m 44s master passed
+1 💚 shadedjars 4m 23s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 17s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 2m 24s the patch passed
+1 💚 compile 1m 41s the patch passed
+1 💚 javac 1m 41s the patch passed
+1 💚 shadedjars 4m 19s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 2m 18s the patch passed
_ Other Tests _
-1 ❌ unit 355m 59s root in the patch failed.
382m 50s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4106
Optional Tests javac javadoc unit shadedjars compile
uname Linux e6b16a6a42e1 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4f491fd
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/testReport/
Max. process+thread count 2394 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4106/11/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants