-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface ES startup errors to the Rally CLI console #1476
Conversation
I believe the build was interrupted after 80 minutes, which explains the failure. @dliappis Do you know how we can bump the timeout? |
I've been looking at this the last couple hours and very consistently getting the timeout on this branch, I think the regression is legit |
Would you still be interested in a review? I've been waiting for the investigation of that regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is nice!
The first time I checked out the PR branch and executed a test, the command seem to hang just after displaying STDERR.
(.venv) grape:rally (pr/1476) $ esrally race --track geonames --distribution-version 8.3.0 --runtime-jdk bundled --kill-running-processes --test-mode --car-params heap_size:1b
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Race id is [1b31ab29-cff1-431a-992e-d52eea732ee0]
[INFO] Preparing for race ...
[ERROR] Daemon startup failed with exit code [1]. STDERR:
Exception in thread "main" java.lang.RuntimeException: starting java failed with [1]
output:
error:
Invalid initial heap size: -Xms1b
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
at org.elasticsearch.server.cli.JvmOption.flagsFinal(JvmOption.java:113)
at org.elasticsearch.server.cli.JvmOption.findFinalOptions(JvmOption.java:80)
at org.elasticsearch.server.cli.MachineDependentHeap.determineHeapSettings(MachineDependentHeap.java:59)
at org.elasticsearch.server.cli.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:132)
at org.elasticsearch.server.cli.JvmOptionsParser.determineJvmOptions(JvmOptionsParser.java:90)
at org.elasticsearch.server.cli.ServerProcess.createProcess(ServerProcess.java:211)
at org.elasticsearch.server.cli.ServerProcess.start(ServerProcess.java:106)
at org.elasticsearch.server.cli.ServerProcess.start(ServerProcess.java:89)
at org.elasticsearch.server.cli.ServerCli.startServer(ServerCli.java:213)
at org.elasticsearch.server.cli.ServerCli.execute(ServerCli.java:90)
at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:54)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85)
at org.elasticsearch.cli.Command.main(Command.java:50)
at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:64)
-First execution hung here for about a minute before exiting manually
[ERROR] Cannot race. Command '['./bin/elasticsearch', '-d', '-p', './pid']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/Users/jbryan/dev/src/rally/esrally/mechanic/mechanic.py", line 590, in receiveMsg_StartNodes
self.mechanic.start_engine()
File "/Users/jbryan/dev/src/rally/esrally/mechanic/mechanic.py", line 707, in start_engine
self.nodes = self.launcher.start(self.node_configs)
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 134, in start
return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 134, in <listcomp>
return [self._start_node(node_configuration, node_count_on_host) for node_configuration in node_configurations]
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 164, in _start_node
node_pid = self._start_process(binary_path, env)
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 219, in _start_process
raise e
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 216, in _start_process
ProcessLauncher._run_subprocess(command_line=" ".join(cmd), env=env)
File "/Users/jbryan/dev/src/rally/esrally/mechanic/launcher.py", line 203, in _run_subprocess
subprocess.run(
File "/Users/jbryan/.pyenv/versions/3.10.4/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./bin/elasticsearch', '-d', '-p', './pid']' returned non-zero exit status 1.
Getting further help:
*********************
* Check the log files in /Users/jbryan/.rally/logs for errors.
* Read the documentation at https://esrally.readthedocs.io/en/latest/.
* Ask a question on the forum at https://discuss.elastic.co/tags/c/elastic-stack/elasticsearch/rally.
* Raise an issue at https://github.com/elastic/rally/issues and include the log files in /Users/jbryan/.rally/logs.
--------------------------------
[INFO] FAILURE (took 12 seconds)
--------------------------------
Subsequent executions exited as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement, LGTM!
)" This reverts commit d4c49f2.
It can be frustrating to troubleshoot why a Rally-managed Elasticsearch instance is not starting. Currently, we only log in the console the sub process's return code, with no further information. If you are intimately familiar with where installations go, and how to preserve them, you can troubleshoot this manually, but it would seem to be better to log the issue the first time rather than requiring a round of reproduction and a process of manually trying it yourself.
A straightforward reproduction of this issue is the following invocation, with a too small heap size:
This change uses the
subprocess.run
function introduced in Python 3.5 to check the subprocess and print its output to the console on failure. There is some background to why we do not do this currently in #879, but I believe the output is worthwhile, and at the time I think 3.5 was very recently our minimum Python version so this option may have been overlooked... or there may be a more technical reason I am missing.