-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8982][Core] Worker hostnames not showing in Master web ui when launched with start-slaves.sh #7345
Conversation
ok to test |
Hm, I only sort of understand the entirety of the logic, but |
I was just surprised when I started up a standalone mode Spark 1.4.0 cluster for the first time (with similar configuration as I've used in the past with Spark 1.0 - 1.3.1) and got IP addresses but no hostnames in the web UI. I suppose this is somewhat cosmetic, but it is useful to have hostnames identified correctly when working with a cluster and debugging jobs, and it seems like it would be desirable for this to work automatically when using standalone mode and start-slaves.sh. It used to work in previous versions. And it seems like using SPARK_LOCAL_HOSTNAME to configure hostnames would require different spark-env.sh configurations between the different workers, which also wasn't previously required. |
Test build #37054 has finished for PR 7345 at commit
|
In previous releases, workers automatically found their fully qualified hostnames when started with start-slaves.sh / start-slave.sh.
Test build #37142 has finished for PR 7345 at commit
|
I tried this out locally and was able to see the difference. I do remember that we used to have host names there instead of IP addresses. I think this is a good change but I'm concerned whether it's safe to always bind to the |
What about an optional command line argument to start-slaves.sh / start-slave.sh that would enable using I guess SPARK-6440 was an intentional change to default to using IP addresses instead of hostnames, but especially when using start-slaves.sh / start-slave.sh it would be nice to have the hostnames functionality back. |
A flag to set the flag? no, callers can always specify |
You definitely specify |
Yes, but you can right? it passes through all the script's args. I thought you were suggesting adding a flag to the slave script, when it can basically already accept one. |
start-slave.sh and slaves.sh pass through all of their args. But start-slaves.sh does not. The last line of start-slaves.sh: "$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT" But even if start-slaves.sh did pass additional arguments to the slaves.sh call, I'm not sure if there is a good way to use slaves.sh to invoke a command that will result in each node's unique hostname being passed to start-slave.sh (slaves.sh escapes spaces in its arguments, making it difficult to pass a quoted, backticked command with arguments as an argument). That's why I suggested an optional flag for start-slave.sh. |
I take your point yeah, nevermind me. |
What if we try to do some refactoring?
I was going to make a PR in a month, but still don't know how to test it. Maybe @bdzimmer can write some tests for this case with hostname instead of ip, and add ipv6 cases? |
Test build #39074 has finished for PR 7345 at commit
|
*let me remind you about this discussion and ipv6 support at all. |
Can one of the admins verify this patch? |
To add my two cents, I think that to call this change "cosmetic" is strictly true but underrates its value. In our case we have additional monitoring systems that report based on local hostname. When we want to investigate an issue we see in monitoring in the Spark UI, not having the hostnames in the UI makes it more difficult than it was in Spark 1.3. Also, because we're in AWS and use amazon's DNS for reverse lookup, we can't do a reverse lookup on the IP to see the hostname. (I know that's our problem—just illustrating the impact from our perspective.) With regard to a fix, I'm also in favor of a simple patch to Thanks. |
@andrewor14 We've put this in production. Everything looks good. Hostnames show up in the UI as expected. No broken links. |
@mallman I'm glad this was useful to someone! I didn't have the time (and probably won't in the future) to look into the refactoring that @nyaapa suggested back in July, and I've since moved on to using custom scripts to start standalone clusters in my own work. But I think this patch works nicely for those who want to use start-slaves.sh. |
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. |
If a "--host" argument is not provided to Worker, WorkerArguments uses Utils.localHostName to find the host name. SPARK-6440 changed the functionality of Utils.localHostName to retrieve the local IP address instead of host name.
Since start-slave.sh does not provide the "--host" argument, clusters started with start-slaves.sh now show IP addresses instead of hostnames in the Master web UI. This is inconvenient when starting and debugging small clusters.
A simple fix would be to find the local machine's hostname in start-slave.sh and pass it as the "--host" argument. I'm not sure if this fix is appropriate for general use, so I'm interested in feedback.
This contribution is my original work and I license the work to the project under the project's open source license.