-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13599] [BUILD] remove transitive groovy dependencies from Hive #11449
[SPARK-13599] [BUILD] remove transitive groovy dependencies from Hive #11449
Conversation
Test build #52249 has finished for PR 11449 at commit
|
Mind adding a Maven Enforcer rule so that the build will fail in case this is ever reintroduced? |
Enforcer rules are also a nice chance to document why something is being excluded, etc. |
Test build #52250 has finished for PR 11449 at commit
|
...let me work out how to do enforcer rules |
OK by me. I'm still trying to work out whether it belongs in 1.6.2 or not -- thoughts? |
+1 for 1.6.x. W.r.t 1.6.2, it'll keep the tar smaller, maybe even load faster. And eliminate the risk of a CVE. if you set spark.authenticate=true untrusted callers can't submit malicious object streams via kryo, so there's less vulnerability |
Test build #52320 has finished for PR 11449 at commit
|
Added PR #11473 to cover only the |
Why would we want to backport this into branch-1.6? We rarely update dependencies, unless there is a security problem. |
Yeah, I'm on the fence; it seems low risk if it's really not used, but that also makes it low priority. Steve you're saying there's a potential security risk in Groovy -- is it purely a potential one, or do you have reason to believe there's an actual risk? You mentioned you definitely wanted to back port so I suspected there was a bit more motive in there |
The risk is deserialization; Groovy CVE-2015-3253 shows how groovy < 2.4.4 makes it straightforward to use a class in Groovy to run arbitrary shell commands on the destination. This has been show on Java ObjectStream and XStream, so assume Kryo is vulnerable too. |
I should that as well as the org.codehaus.groovy package, there's various shaded things in groovy/ and an unshaded copy of antlr. This may create versioning problems with the antlr.jar pulled in by spark-catalyst, though it'd depend on which version of the antlr classes got pulled in and on antlr's compatibility story. |
FWIW, We're backporting it in-house.Without it, downstream applications which try to add a secure groovy to their jobs won't know which version is picked up. |
Groovy and Xstream attack. Assume that you can do the same in Kryo, it just takes someone to sit down and do the work. |
OK thanks for the information. |
Merging this in master. |
thx |
…-hive and spark-hiveserver (branch 1.6) ## What changes were proposed in this pull request? This is just the patch of #11449 cherry picked to branch-1.6; the enforcer and dep/ diffs are cut Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR. This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR. ## How was this patch tested? 1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver` 1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs 1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt` 1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive` 1. Patch applied 1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set 1. Examined created spark-assembly, verified no org.codehaus packages 1. Verified that the maven dependency tree no longer references groovy The `master` version updates the dependency files and an enforcer rule to keep groovy out; this patch strips it out. Author: Steve Loughran <[email protected]> Closes #11473 from steveloughran/fixes/SPARK-13599-groovy+branch-1.6.
## What changes were proposed in this pull request? Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR. This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR. ## How was this patch tested? 1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver` 1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs 1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt` 1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive` 1. Patch applied 1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set 1. Examined created spark-assembly, verified no org.codehaus packages 1. Verified that the maven dependency tree no longer references groovy Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded Author: Steve Loughran <[email protected]> Closes apache#11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
What changes were proposed in this pull request?
Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
This stops the groovy classes and everything else in that uber-JAR from getting into spark-assembly JAR.
How was this patch tested?
mvn clean install -Pyarn,hive,hive-thriftserver
mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt
org.spark-project.hive
-Pyarn,hive,hive-thriftserver
setNote also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded