-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL][WIP] Refined Thrift server test suite #2214
Conversation
test this please |
QA tests have started for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
983d030
to
94a83ba
Compare
ok to test |
test this please |
QA tests have started for PR 2214 at commit
|
The test failures only occur when running on Jenkins, I couldn't reproduce it either locally or on Jenkins server node. May fire some debugging commits later to investigate this issue. |
QA tests have finished for PR 2214 at commit
|
retest this please |
ok to test |
QA tests have started for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
Hmm, a summary about the failure pattern of
|
QA tests have started for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
1185d79
to
c537e37
Compare
QA tests have started for PR 2214 at commit
|
add to whitelist |
ok to test |
Tests timed out after a configured wait of |
QA tests have started for PR 2214 at commit
|
Tests timed out after a configured wait of |
test this please |
@liancheng are we still debugging issues here? or just waiting for it to pass? |
QA tests have started for PR 2214 at commit
|
@marmbrus not ready yet, was not able to debug it since Jenkins was quite crazy these days. I'll remove the WIP tag once it's ready. |
QA tests have finished for PR 2214 at commit
|
a1ad308
to
23d96f1
Compare
QA tests have started for PR 2214 at commit
|
QA tests have started for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
QA tests have started for PR 2214 at commit
|
QA tests have finished for PR 2214 at commit
|
#2675 supersedes this one, closing. |
As scwf pointed out, `HiveThriftServer2Suite` isn't effective anymore after the Thrift server was made a daemon. On the other hand, these test suites were known flaky, PR #2214 tried to fix them but failed because of unknown Jenkins build error. This PR fixes both sets of issues. In this PR, instead of watching `start-thriftserver.sh` output, the test code start a `tail` process to watch the log file. A `Thread.sleep` has to be introduced because the `kill` command used in `stop-thriftserver.sh` is not synchronous. As for the root cause of the mysterious Jenkins build failure. Please refer to [this comment](#2675 (comment)) below for details. ---- (Copied from PR description of #2214) This PR fixes two issues of `HiveThriftServer2Suite` and brings 1 enhancement: 1. Although metastore, warehouse directories and listening port are randomly chosen, all test cases share the same configuration. Due to parallel test execution, one of the two test case is doomed to fail 2. We caught any exceptions thrown from a test case and print diagnosis information, but forgot to re-throw the exception... 3. When the forked server process ends prematurely (e.g., fails to start), the `serverRunning` promise is completed with a failure, preventing the test code to keep waiting until timeout. So, embarrassingly, this test suite was failing continuously for several days but no one had ever noticed it... Fortunately no bugs in the production code were covered under the hood. Author: Cheng Lian <[email protected]> Author: wangfei <[email protected]> Closes #2675 from liancheng/fix-thriftserver-tests and squashes the following commits: 1c384b7 [Cheng Lian] Minor code cleanup, restore the logging level hack in TestHive.scala 7805c33 [wangfei] reset SPARK_TESTING to avoid loading Log4J configurations in testing class paths af2b5a9 [Cheng Lian] Removes log level hacks from TestHiveContext d116405 [wangfei] make sure that log4j level is INFO ee92a82 [Cheng Lian] Relaxes timeout 7fd6757 [Cheng Lian] Fixes test suites in hive-thriftserver
As scwf pointed out, `HiveThriftServer2Suite` isn't effective anymore after the Thrift server was made a daemon. On the other hand, these test suites were known flaky, PR apache#2214 tried to fix them but failed because of unknown Jenkins build error. This PR fixes both sets of issues. In this PR, instead of watching `start-thriftserver.sh` output, the test code start a `tail` process to watch the log file. A `Thread.sleep` has to be introduced because the `kill` command used in `stop-thriftserver.sh` is not synchronous. As for the root cause of the mysterious Jenkins build failure. Please refer to [this comment](apache#2675 (comment)) below for details. ---- (Copied from PR description of apache#2214) This PR fixes two issues of `HiveThriftServer2Suite` and brings 1 enhancement: 1. Although metastore, warehouse directories and listening port are randomly chosen, all test cases share the same configuration. Due to parallel test execution, one of the two test case is doomed to fail 2. We caught any exceptions thrown from a test case and print diagnosis information, but forgot to re-throw the exception... 3. When the forked server process ends prematurely (e.g., fails to start), the `serverRunning` promise is completed with a failure, preventing the test code to keep waiting until timeout. So, embarrassingly, this test suite was failing continuously for several days but no one had ever noticed it... Fortunately no bugs in the production code were covered under the hood. Author: Cheng Lian <[email protected]> Author: wangfei <[email protected]> Closes apache#2675 from liancheng/fix-thriftserver-tests and squashes the following commits: 1c384b7 [Cheng Lian] Minor code cleanup, restore the logging level hack in TestHive.scala 7805c33 [wangfei] reset SPARK_TESTING to avoid loading Log4J configurations in testing class paths af2b5a9 [Cheng Lian] Removes log level hacks from TestHiveContext d116405 [wangfei] make sure that log4j level is INFO ee92a82 [Cheng Lian] Relaxes timeout 7fd6757 [Cheng Lian] Fixes test suites in hive-thriftserver Conflicts: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
NOTE
This PR fixes two issues of
HiveThriftServer2Suite
and brings 1 enhancement:serverRunning
promise is completed with a failure, preventing the test code to keep waiting until timeout.So, embarrassingly, this test suite was failing continuously for several days but no one had ever noticed it... Fortunately no bugs in the production code were covered under the hood.