-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20301][FLAKY-TEST] Fix Hadoop Shell.runCommand flakiness in Structured Streaming tests #17613
Conversation
true | ||
} | ||
// Report trigger as finished and construct progress object. | ||
finishTrigger(dataAvailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you move this out of the reportTimeTaken { ... }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I moved it out. Is the diff and whitespace confusing?
@@ -277,6 +277,11 @@ trait StreamTest extends QueryTest with SharedSQLContext with Timeouts { | |||
|
|||
def threadState = | |||
if (currentStream != null && currentStream.microBatchThread.isAlive) "alive" else "dead" | |||
def threadStackTrace = if (currentStream != null && currentStream.microBatchThread.isAlive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on keeping this.
Test build #75720 has finished for PR 17613 at commit
|
Test build #75726 has started for PR 17613 at commit |
retest this please |
Test build #75733 has finished for PR 17613 at commit
|
LGTM. Merging to master |
…ructured Streaming tests ## What changes were proposed in this pull request? Some Structured Streaming tests show flakiness such as: ``` [info] - prune results by current_date, complete mode - 696 *** FAILED *** (10 seconds, 937 milliseconds) [info] Timed out while stopping and waiting for microbatchthread to terminate.: The code passed to failAfter did not complete within 10 seconds. ``` This happens when we wait for the stream to stop, but it doesn't. The reason it doesn't stop is that we interrupt the microBatchThread, but Hadoop's `Shell.runCommand` swallows the interrupt exception, and the exception is not propagated upstream to the microBatchThread. Then this thread continues to run, only to start blocking on the `streamManualClock`. ## How was this patch tested? Thousand retries locally and [Jenkins](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75720/testReport) of the flaky tests Author: Burak Yavuz <[email protected]> Closes apache#17613 from brkyvz/flaky-stream-agg.
What changes were proposed in this pull request?
Some Structured Streaming tests show flakiness such as:
This happens when we wait for the stream to stop, but it doesn't. The reason it doesn't stop is that we interrupt the microBatchThread, but Hadoop's
Shell.runCommand
swallows the interrupt exception, and the exception is not propagated upstream to the microBatchThread. Then this thread continues to run, only to start blocking on thestreamManualClock
.How was this patch tested?
Thousand retries locally and Jenkins of the flaky tests