Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20338][CORE]Spaces in spark.eventLog.dir are not correctly handled #17638

Closed
wants to merge 1 commit into from

Conversation

zuotingbing
Copy link

@zuotingbing zuotingbing commented Apr 14, 2017

What changes were proposed in this pull request?

Spaces in spark.eventLog.dir are not correctly handled.

  1. “spark.eventLog.dir” supports with space characters.
  2. As usually, if the run classpath includes hdfs-site.xml and core-site.xml files, the supplied path eg."/testdir" which does not contain a scheme should be taken as a HDFS path rather than a local path since the path parameter is a Hadoop dir.

How was this patch tested?

Exist tests and manual tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zuotingbing
Copy link
Author

cc @srowen

@@ -405,9 +405,7 @@ class SparkContext(config: SparkConf) extends Logging {

_eventLogDir =
if (isEventLogEnabled) {
val unresolvedDir = conf.get("spark.eventLog.dir", EventLoggingListener.DEFAULT_LOG_DIR)
.stripSuffix("/")
Some(Utils.resolveURI(unresolvedDir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you URI encode this unresolvedDir here? I think encode space to "%20" should be enough. Not sure why you need to change lots of code.

scala> val a = new URI("/tmp/aa%20nn")
a: java.net.URI = /tmp/aa%20nn

scala> val path = new Path(a)
path: org.apache.hadoop.fs.Path = /tmp/aa nn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dir contains space and also contains %20 (e.g "hdfs://nn:9000/a b%20c"), i seems to me that the encode does not work well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in your case we need to percentile encode the unresolvedDir before calling resolveURI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to use new Path(path).toURI() instead new URI(path) since new URI(path) not support space in path.
It is not necessary to use encode if we use new Path(path).toURI()

this(appId, appAttemptId, logBaseDir, sparkConf,
SparkHadoopUtil.get.newConfiguration(sparkConf))

private val shouldCompress = sparkConf.getBoolean("spark.eventLog.compress", false)
private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false)
private val testing = sparkConf.getBoolean("spark.eventLog.testing", false)
private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024
private val fileSystem = Utils.getHadoopFileSystem(logBaseDir, hadoopConf)
Copy link
Contributor

@jerryshao jerryshao Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is a by design choice, only when scheme is defined then Spark will pick the right FS, otherwise it will use local FS instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the URI of "hdfs://nn:9000/a b/c" ? Even there is right scheme of FS but it will use local FS instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Let me investigate a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That because resolveURI get URISyntaxException when resolving hdfs://nn:9000/a b/c and resolveURI will change to local File instead. Please see the implementation of resolveURI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i have tested. In resolveURI function if path contains space, new URI(path) will throw exception and then will be use as a local FS.
Thanks shao.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i think we should not use new URI(path) since it not support space in path.
i suggest to use new Path(path).toURI() instead new URI(path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with you. String or URI representation should be equal, it is not that changing to String representation then the issue is workaround-ed.

I think in your case we need to fix resolveURI to handle space case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i will try to fix resolveURI to handle space case,Thanks.
What is your opinion if i use val uri = new Path(path).toUri instead val uri = new URI(path) in resolveURI? we do not need to use encode, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from your case it is workable, but I'm sure if it could handle all the cases in UT.

@HyukjinKwon
Copy link
Member

Hi all, where are we on this?

@asfgit asfgit closed this in b771fed Jun 8, 2017
@zuotingbing zuotingbing deleted the spark-eventlogdir branch June 22, 2017 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants