Skip to content

Commit

Permalink
[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassL…
Browse files Browse the repository at this point in the history
…oader

## What changes were proposed in this pull request?

`ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used.

This is take over of #17074,  the primary author of this PR is taroplus .

Should close #17074 after this PR get merged.

## How was this patch tested?

Add test case in `ExecutorClassLoaderSuite`.

Author: Kohki Nishio <[email protected]>
Author: Xingbo Jiang <[email protected]>

Closes #18614 from jiangxb1987/executor_classloader.
  • Loading branch information
taroplus authored and cloud-fan committed Jul 13, 2017
1 parent 780586a commit e08d06b
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,23 @@ import org.apache.spark.internal.Logging
import org.apache.spark.util.{ParentClassLoader, Utils}

/**
* A ClassLoader that reads classes from a Hadoop FileSystem or HTTP URI,
* used to load classes defined by the interpreter when the REPL is used.
* Allows the user to specify if user class path should be first.
* This class loader delegates getting/finding resources to parent loader,
* which makes sense until REPL never provide resource dynamically.
* A ClassLoader that reads classes from a Hadoop FileSystem or HTTP URI, used to load classes
* defined by the interpreter when the REPL is used. Allows the user to specify if user class path
* should be first. This class loader delegates getting/finding resources to parent loader, which
* makes sense until REPL never provide resource dynamically.
*
* Note: [[ClassLoader]] will preferentially load class from parent. Only when parent is null or
* the load failed, that it will call the overridden `findClass` function. To avoid the potential
* issue caused by loading class using inappropriate class loader, we should set the parent of
* ClassLoader to null, so that we can fully control which class loader is used. For detailed
* discussion, see SPARK-18646.
*/
class ExecutorClassLoader(
conf: SparkConf,
env: SparkEnv,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean) extends ClassLoader with Logging {
userClassPathFirst: Boolean) extends ClassLoader(null) with Logging {
val uri = new URI(classUri)
val directory = uri.getPath

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ import java.nio.channels.{FileChannel, ReadableByteChannel}
import java.nio.charset.StandardCharsets
import java.nio.file.{Paths, StandardOpenOption}
import java.util
import java.util.Collections
import javax.tools.{JavaFileObject, SimpleJavaFileObject, ToolProvider}

import scala.io.Source
import scala.language.implicitConversions
Expand Down Expand Up @@ -77,6 +79,50 @@ class ExecutorClassLoaderSuite
}
}

test("child over system classloader") {
// JavaFileObject for scala.Option class
val scalaOptionFile = new SimpleJavaFileObject(
URI.create(s"string:///scala/Option.java"),
JavaFileObject.Kind.SOURCE) {

override def getCharContent(ignoreEncodingErrors: Boolean): CharSequence = {
"package scala; class Option {}"
}
}
// compile fake scala.Option class
ToolProvider
.getSystemJavaCompiler
.getTask(null, null, null, null, null, Collections.singletonList(scalaOptionFile)).call()

// create 'scala' dir in tempDir1
val scalaDir = new File(tempDir1, "scala")
assert(scalaDir.mkdir(), s"Failed to create 'scala' directory in $tempDir1")

// move the generated class into scala dir
val filename = "Option.class"
val result = new File(filename)
assert(result.exists(), "Compiled file not found: " + result.getAbsolutePath)

val out = new File(scalaDir, filename)
Files.move(result, out)
assert(out.exists(), "Destination file not moved: " + out.getAbsolutePath)

// construct class loader tree
val parentLoader = new URLClassLoader(urls2, null)
val classLoader = new ExecutorClassLoader(
new SparkConf(), null, url1, parentLoader, true)

// load 'scala.Option', using ClassforName to do the exact same behavior as
// what JavaDeserializationStream does

// scalastyle:off classforname
val optionClass = Class.forName("scala.Option", false, classLoader)
// scalastyle:on classforname

assert(optionClass.getClassLoader == classLoader,
"scala.Option didn't come from ExecutorClassLoader")
}

test("child first") {
val parentLoader = new URLClassLoader(urls2, null)
val classLoader = new ExecutorClassLoader(new SparkConf(), null, url1, parentLoader, true)
Expand Down

0 comments on commit e08d06b

Please sign in to comment.