Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8982][Core] Worker hostnames not showing in Master web ui when launched with start-slaves.sh #7345

Closed
wants to merge 3 commits into from

Conversation

bdzimmer
Copy link

If a "--host" argument is not provided to Worker, WorkerArguments uses Utils.localHostName to find the host name. SPARK-6440 changed the functionality of Utils.localHostName to retrieve the local IP address instead of host name.

Since start-slave.sh does not provide the "--host" argument, clusters started with start-slaves.sh now show IP addresses instead of hostnames in the Master web UI. This is inconvenient when starting and debugging small clusters.

A simple fix would be to find the local machine's hostname in start-slave.sh and pass it as the "--host" argument. I'm not sure if this fix is appropriate for general use, so I'm interested in feedback.

This contribution is my original work and I license the work to the project under the project's open source license.

@andrewor14
Copy link
Contributor

ok to test

@srowen
Copy link
Member

srowen commented Jul 10, 2015

Hm, I only sort of understand the entirety of the logic, but Utils.localHostName is what a lot of stuff consults to know what the local host is, and the logic does return an IP address to be less ambiguous. That's overrideable with SPARK_LOCAL_HOSTNAME. Yes, --host also overrides what Worker uses for specific purposes like the RPC endpoint. Are you just worried about the cosmetic effect? because this change does have functional implications.

@bdzimmer
Copy link
Author

I was just surprised when I started up a standalone mode Spark 1.4.0 cluster for the first time (with similar configuration as I've used in the past with Spark 1.0 - 1.3.1) and got IP addresses but no hostnames in the web UI. I suppose this is somewhat cosmetic, but it is useful to have hostnames identified correctly when working with a cluster and debugging jobs, and it seems like it would be desirable for this to work automatically when using standalone mode and start-slaves.sh.

It used to work in previous versions. And it seems like using SPARK_LOCAL_HOSTNAME to configure hostnames would require different spark-env.sh configurations between the different workers, which also wasn't previously required.

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37054 has finished for PR 7345 at commit 9c4127a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

In previous releases, workers automatically found their fully qualified hostnames when started with start-slaves.sh / start-slave.sh.
@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #37142 has finished for PR 7345 at commit 0cac1d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

I tried this out locally and was able to see the difference. I do remember that we used to have host names there instead of IP addresses. I think this is a good change but I'm concerned whether it's safe to always bind to the hostname -f instead of the IP?

@bdzimmer
Copy link
Author

What about an optional command line argument to start-slaves.sh / start-slave.sh that would enable using hostname -f? Or an environment variable? That way IP addresses could stay the default behavior.

I guess SPARK-6440 was an intentional change to default to using IP addresses instead of hostnames, but especially when using start-slaves.sh / start-slave.sh it would be nice to have the hostnames functionality back.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

A flag to set the flag? no, callers can always specify --host directly. Yeah the problem is we are weighing a cosmetic, but legitimate, improvement over something that might have functional impact, but we aren't sure what. @nyaapa you did some of the work we're talking about here; any opinions?

@bdzimmer
Copy link
Author

You definitely specify --host if you are calling start-slave.sh directly. But start-slaves.sh invokes the same command for each slave.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

Yes, but you can right? it passes through all the script's args. I thought you were suggesting adding a flag to the slave script, when it can basically already accept one.

@bdzimmer
Copy link
Author

start-slave.sh and slaves.sh pass through all of their args. But start-slaves.sh does not. The last line of start-slaves.sh:

"$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"

But even if start-slaves.sh did pass additional arguments to the slaves.sh call, I'm not sure if there is a good way to use slaves.sh to invoke a command that will result in each node's unique hostname being passed to start-slave.sh (slaves.sh escapes spaces in its arguments, making it difficult to pass a quoted, backticked command with arguments as an argument). That's why I suggested an optional flag for start-slave.sh.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

I take your point yeah, nevermind me.

@nyaapa
Copy link
Contributor

nyaapa commented Jul 17, 2015

What if we try to do some refactoring?

  1. we need to clean up some old code with ipv4:port hardcoded http://pastebin.com/eJvR1sA4
  2. after that we can remove localHostNameForURI function, I think that it was a mistake, because it didn't actually helped and spark 1.4.1 without previous fix can't work as ipv6 standalone http://pastebin.com/c0n4NKZ8
  3. for https://github.com/apache/spark/pull/5424/files#diff-d239aee594001f8391676e1047a0381eR710 at 816 in findLocalInetAddress() we need to save hostname val strippedAddress = InetAddress.getByAddress(addr.getAddress) -> val strippedAddress = InetAddress.getByAddress(addr.getHostName, addr.getAddress)
  4. at 849 in localHostName we can write
def localHostName(): String = { 

  customHostname.getOrElse({
    val name = localIpAddress.getHostName
    if ( name.indexOf(":") == -1 ) name
    else InetAddresses.toUriString(localIpAddress)
  })
}
  1. we can rename localIpAddress to localAddress

I was going to make a PR in a month, but still don't know how to test it. Maybe @bdzimmer can write some tests for this case with hostname instead of ip, and add ipv6 cases?

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39074 has finished for PR 7345 at commit 71c5573.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nyaapa
Copy link
Contributor

nyaapa commented Sep 25, 2015

*let me remind you about this discussion and ipv6 support at all.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@andrewor14
Copy link
Contributor

@srowen @bdzimmer What's the resolution on this? If possible I would prefer a small change that doesn't require refactoring or introducing the flag. What are some side-effects of this patch in its current state?

@mallman
Copy link
Contributor

mallman commented Dec 17, 2015

To add my two cents, I think that to call this change "cosmetic" is strictly true but underrates its value. In our case we have additional monitoring systems that report based on local hostname. When we want to investigate an issue we see in monitoring in the Spark UI, not having the hostnames in the UI makes it more difficult than it was in Spark 1.3. Also, because we're in AWS and use amazon's DNS for reverse lookup, we can't do a reverse lookup on the IP to see the hostname. (I know that's our problem—just illustrating the impact from our perspective.)

With regard to a fix, I'm also in favor of a simple patch to start-slave.sh. In fact, it's something we're going to try ourselves. I'll try to get it done today and will report here on our experience.

Thanks.

@mallman
Copy link
Contributor

mallman commented Dec 17, 2015

@andrewor14 We've put this in production. Everything looks good. Hostnames show up in the UI as expected. No broken links.

@bdzimmer
Copy link
Author

@mallman I'm glad this was useful to someone! I didn't have the time (and probably won't in the future) to look into the refactoring that @nyaapa suggested back in July, and I've since moved on to using custom scripts to start standalone clusters in my own work. But I think this patch works nicely for those who want to use start-slaves.sh.

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants