-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26545 Implement tracing of scan #4106
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
hbase-server/src/test/java/org/apache/hadoop/hbase/trace/OpenTelemetryTestRule.java
Outdated
Show resolved
Hide resolved
So for async table, we will create a span when starting the scanner, and once we actually execute a piece of code, we will restore that span? So when will we end the span? |
For async table, we create the span in |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncClientScanner.java
Outdated
Show resolved
Hide resolved
ad27bcb
to
12a5621
Compare
I've pushed an update. Rebased onto master. Reworked how the span is threaded through Marked the PR as a draft because I think there's more work to do -- at the very least, I plan to introduce test coverage via new, dedicated test class(es) instead of piggy-backing on the existing scan tests. Duo -- I look forward to your further thoughts/observations. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncClientScanner.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnectionImpl.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableImpl.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableImpl.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncTableImpl.java
Show resolved
Hide resolved
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableTracing.java
Show resolved
Hide resolved
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableTracing.java
Show resolved
Hide resolved
hbase-client/src/test/java/org/apache/hadoop/hbase/client/trace/StringTraceRenderer.java
Outdated
Show resolved
Hide resolved
@@ -1884,12 +1884,13 @@ | |||
-Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true | |||
-Djava.awt.headless=true -Djdk.net.URLClassPath.disableClassPathURLCheck=true | |||
-Dorg.apache.hbase.thirdparty.io.netty.leakDetection.level=advanced | |||
-Dio.netty.eventLoopThreads=3 | |||
-Dio.netty.eventLoopThreads=3 -Dio.opentelemetry.context.enableStrictContext=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to enable strict checking that scopes are closed properly. I'm not convinced it works in practice.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3b4b77f
to
46d0c9d
Compare
I broke off HBASE-26764 and HBASE-26765 as #4120 and #4121 respectively. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
46d0c9d
to
4cc2aa4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
4cc2aa4
to
667c6d3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Ping @joshelser @taklwu @Apache9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test changes that assert continuity are fairly extensive. I started skimming there after realizing the pattern of the refactor.
.operationTimeout(scanTimeoutNs, TimeUnit.NANOSECONDS).pause(pauseNs, TimeUnit.NANOSECONDS) | ||
.pauseForCQTBE(pauseForCQTBENs, TimeUnit.NANOSECONDS).maxAttempts(maxAttempts) | ||
.startLogErrorsCnt(startLogErrorsCnt).action(this::callOpenScanner).call(); | ||
try (Scope ignored = span.makeCurrent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to also maintain your volatile usage discipline here and copy 'span' to a 'localSpan' first, as you have done at other call sites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do so here because the member field is accessed only once in the method lifetime, while the other call sites made multiple references to the field. I can use the same pattern here, sure, it won't hurt anything.
break; | ||
Span span = null; | ||
try (AsyncTableResultScanner scanner = rawTable.getScanner(scan)) { | ||
span = scanner.getSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be null? I guess not as long as someone is aware of getter/setter for span in AsyncTableResultScanner
and the expected convention. I suppose the resulting NPE would clear enough if not.
And ditto other call sites. I might have thrown something with an explicit message about failure to maintain the code discipline but it seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier version of this patch, I hade null-protection sprinkled throughout this method. IntelliJ static analysis annotated it all as unnecessary. If you're also wondering... I'll trust two devs over the tool. Let me bring it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to throw because a disruption in tracing should not hinder application execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm simplifying this according to the hints provided by the IDE. I cannot make the field final
because the compiler errors in the catch block that the field hasn't been initialized, but static analysis agrees with @Apache9 that the field can never be null.
@@ -452,6 +456,53 @@ public void testScanAll() { | |||
assertTrace("SCAN"); | |||
} | |||
|
|||
@Test | |||
public void testScan() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: testScanTracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class has "Tracing" in the name, so I didn't think it necessary.
} | ||
|
||
@Test | ||
public void testGetScanner() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: testGetScannerTracing?
hbase-client/src/test/java/org/apache/hadoop/hbase/client/trace/StringTraceRenderer.java
Outdated
Show resolved
Hide resolved
protected static final ConnectionRule connectionRule = | ||
new ConnectionRule(miniClusterRule::createConnection); | ||
|
||
private static final class Setup extends ExternalResource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I was wondering about this but I see it is just a junit convention refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just convention. JUnit makes no promises re: execution order of @Before
and @Rule
, of @BeforeClass
and @ClassRule
. I want to use fields that are initialized by the @ClassRule
annotation, so I need to participate in that annotation. The ChainRule
allows me to explicitly order the execution, so that otel is initialize first, then the miniCluster, then the connection to the miniCluster, and finally my code that makes use of the previous.
I much much much prefer the Rules over test class inheritance because mixing in functionality is so much easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can upgrading to junit5 solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the RuleChain stuff as a "problem", it's just the way this API works. I believe that JUnit 5 is a complete overhaul of the testing harness, I haven't looked into what all is involved, but I think it is involved.
e00d40c
to
46d6e8f
Compare
}); | ||
} | ||
|
||
public void start() { | ||
openScanner(); | ||
final Span localSpan = new TableOperationSpanBuilder(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place where we create this scanner and call start on it is in RawAsyncTableImpl.scan method, where both these two operations are in the same thread, so I think here we could make the Span field final and create it in the constructor, so we do not need to care about concurrency access any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yes, I suppose so.
@@ -573,7 +574,8 @@ private void call() { | |||
resetController(controller, callTimeoutNs, priority); | |||
ScanRequest req = RequestConverter.buildScanRequest(scannerId, scan.getCaching(), false, | |||
nextCallSeq, scan.isScanMetricsEnabled(), false, scan.getLimit()); | |||
stub.scan(controller, req, resp -> onComplete(controller, resp)); | |||
Context context = Context.current(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose here? Wrap the callback in the span in the current context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's to wrap the onComplete
callback invocation. The line following would be less awkward if Context
offered a method <T> Consumer<T> wrap(Consumer<T> consumer)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this API has been added upstream, open-telemetry/opentelemetry-java@cffbd32
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableImpl.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableImpl.java
Outdated
Show resolved
Hide resolved
Haven't finished the review yet. Will take a look on the second half tonight. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big concerns on the second half, most changes are tests, which I think is good.
protected static final ConnectionRule connectionRule = | ||
new ConnectionRule(miniClusterRule::createConnection); | ||
|
||
private static final class Setup extends ExternalResource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can upgrading to junit5 solve this problem?
46d6e8f
to
2b7bc6d
Compare
Rebased onto master and I believe I've addressed all outstanding reviewer feedback. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. The only thing is still about when to end the span. I think since we have a ScanResultConsumer, we could just follow the old logic, to add span.end at the place where we call consumer.onComplate and consumer.onError.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncClientScanner.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncClientScanner.java
Outdated
Show resolved
Hide resolved
2b7bc6d
to
92fbdc0
Compare
Approved. I think this is last piece of tracing? Thanks @ndimiduk for the hard! |
🎊 +1 overall
This message was automatically generated. |
* on `AsyncTable`, both `scan` and `scanAll` methods should result in `SCAN` table operations. * the span of the `SCAN` table operation should have children representing all the RPC calls involved in servicing the scan. * when a user provides custom implementation of `AdvancedScanResultConsumer`, any spans emitted from the callback methods should also be tied to the span that represents the `SCAN` table operation. This is easily done because these callbacks are executed on the RPC thread. * when a user provides a custom implementation of `ScanResultConsumer`, any spans emitted from the callback methods should be also be tied to the span that represents the `SCAN` table operation. This accomplished by carefully passing the span instance around after it is created. Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
92fbdc0
to
2c439d2
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
AsyncTable
, bothscan
andscanAll
methods should result inSCAN
table operations.SCAN
table operation should have children representing all the RPC callsinvolved in servicing the scan.
AdvancedScanResultConsumer
, any spans emittedfrom the callback methods should also be tied to the span that represents the
SCAN
tableoperation. This is easily done because these callbacks are executed on the RPC thread.
ScanResultConsumer
, any spans emitted from thecallback methods are tied to the context of user application code. We could
link
them to theSCAN
table operation, except that we have no means of passing that span along.