-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20338][CORE]Spaces in spark.eventLog.dir are not correctly handled #17638
Conversation
Can one of the admins verify this patch? |
e2bf794
to
4b203f1
Compare
cc @srowen |
@@ -405,9 +405,7 @@ class SparkContext(config: SparkConf) extends Logging { | |||
|
|||
_eventLogDir = | |||
if (isEventLogEnabled) { | |||
val unresolvedDir = conf.get("spark.eventLog.dir", EventLoggingListener.DEFAULT_LOG_DIR) | |||
.stripSuffix("/") | |||
Some(Utils.resolveURI(unresolvedDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you URI encode this unresolvedDir
here? I think encode space to "%20" should be enough. Not sure why you need to change lots of code.
scala> val a = new URI("/tmp/aa%20nn")
a: java.net.URI = /tmp/aa%20nn
scala> val path = new Path(a)
path: org.apache.hadoop.fs.Path = /tmp/aa nn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like space is acceptable for resolveURI
(https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala#L479).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dir contains space and also contains %20 (e.g "hdfs://nn:9000/a b%20c"), i seems to me that the encode does not work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in your case we need to percentile encode the unresolvedDir
before calling resolveURI
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest to use new Path(path).toURI()
instead new URI(path)
since new URI(path)
not support space in path.
It is not necessary to use encode if we use new Path(path).toURI()
this(appId, appAttemptId, logBaseDir, sparkConf, | ||
SparkHadoopUtil.get.newConfiguration(sparkConf)) | ||
|
||
private val shouldCompress = sparkConf.getBoolean("spark.eventLog.compress", false) | ||
private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) | ||
private val testing = sparkConf.getBoolean("spark.eventLog.testing", false) | ||
private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024 | ||
private val fileSystem = Utils.getHadoopFileSystem(logBaseDir, hadoopConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably is a by design choice, only when scheme is defined then Spark will pick the right FS, otherwise it will use local FS instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the URI of "hdfs://nn:9000/a b/c" ? Even there is right scheme of FS but it will use local FS instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Let me investigate a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That because resolveURI
get URISyntaxException
when resolving hdfs://nn:9000/a b/c
and resolveURI
will change to local File instead. Please see the implementation of resolveURI
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i have tested. In resolveURI function if path contains space, new URI(path)
will throw exception and then will be use as a local FS.
Thanks shao.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i think we should not use new URI(path)
since it not support space in path.
i suggest to use new Path(path).toURI()
instead new URI(path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with you. String or URI representation should be equal, it is not that changing to String representation then the issue is workaround-ed.
I think in your case we need to fix resolveURI
to handle space case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i will try to fix resolveURI
to handle space case,Thanks.
What is your opinion if i use val uri = new Path(path).toUri
instead val uri = new URI(path)
in resolveURI
? we do not need to use encode, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from your case it is workable, but I'm sure if it could handle all the cases in UT.
Hi all, where are we on this? |
What changes were proposed in this pull request?
Spaces in spark.eventLog.dir are not correctly handled.
How was this patch tested?
Exist tests and manual tests