-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1031] SBT correct the LICENSE and NOTICE for shaded client jars #1967
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1967 +/- ##
==========================================
+ Coverage 46.71% 46.84% +0.14%
==========================================
Files 165 165
Lines 10668 10668
Branches 972 972
==========================================
+ Hits 4982 4996 +14
+ Misses 5362 5350 -12
+ Partials 324 322 -2 see 2 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
project/CelebornBuild.scala
Outdated
case m if m == "META-INF/LICENSE.txt" => MergeStrategy.discard | ||
case m if m == "META-INF/NOTICE.txt" => MergeStrategy.discard | ||
case m if m == "LICENSE.txt" => MergeStrategy.discard | ||
case m if m == "NOTICE.txt" => MergeStrategy.discard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to know which jar the file comes from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ideal goal here is to drop all files **/LICENSE.*
and **/NOTICE.*
that come from the third-party jars and only keep the one provided by ourselves. I don't find an elegant approach in Maven, hope we can make it happen in SBT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maven-shade-plugin
provides some license transformers.
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html
<transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer"/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented a simple ResourceTransformer
that can output LICENSE.txt
and NOTICE.txt
source jar.
import org.apache.maven.plugins.shade.resource.ResourceTransformer;
public class PrintResourceTransformer implements ResourceTransformer {
@Override
public boolean canTransformResource(String s) {
boolean hasFlag = false;
if (s.contains("LICENSE.txt")) {
hasFlag = true;
} else if (s.contains("NOTICE.txt")) {
hasFlag = true;
}
if (hasFlag) {
URL resource = this.getClass().getResource("/" + s);
System.err.printf("PrintResourceTransformer %s,%s,%s%n", s, hasFlag, resource);
}
return hasFlag;
}
PrintResourceTransformer META-INF/LICENSE.txt,true,jar:file:/Users/csy/.m2/repository/org/jdom/jdom2/2.0.6.1/jdom2-2.0.6.1.jar!/META-INF/LICENSE.txt
PrintResourceTransformer META-INF/NOTICE.txt,true,jar:file:/Users/csy/.m2/repository/commons-io/commons-io/2.6/commons-io-2.6.jar!/META-INF/NOTICE.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, the built-in apache license/notice transformer does not work well during my previous test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, the built-in apache license/notice transformer does not work well during my previous test
ApacheLicenseResourceTransformer
just ignores META-INF/LICENSE
, META-INF/LICENSE.txt
, META-INF/LICENSE.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cxzl25 can we implement it in sbt? the previous discussion aims to remove maven and only keep sbt eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ideal goal here is to drop all files
**/LICENSE.*
and**/NOTICE.*
that come from the third-party jars and only keep the one provided by ourselves. I don't find an elegant approach in Maven, hope we can make it happen in SBT
we can refer to pekko, and achieve this goal with MergeStrategy.concat
assembly / assemblyMergeStrategy := {
case n if n.endsWith(".class") => MergeStrategy.first
case n if n.endsWith(".txt") => MergeStrategy.concat
case n if n.endsWith("NOTICE") => MergeStrategy.concat
case n if n.endsWith("LICENSE") => MergeStrategy.concat
case n => (assembly / assemblyMergeStrategy).value.apply(n)
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using contains? I tested it locally and verified that it can be excluded.
I wrote a script and found that SBT's LICENSE and NOTICE are consistent with Maven's. #!/bin/bash
# Specify the paths for two directories
dir1="sbt"
dir2="maven"
# Use find command to list all files in the first directory
find "$dir1" -type f | while read file1; do
# Get the relative path
rel_path="${file1#$dir1}"
# Construct the corresponding file path in the second directory
file2="$dir2$rel_path"
# Check if a file with the same path exists in the second directory
if [ -e "$file2" ]; then
# Use md5sum command to compare contents of the two files
hash1=$(md5sum "$file1" | awk '{print $1}')
hash2=$(md5sum "$file2" | awk '{print $1}')
# Compare MD5 hash values
if [[ "$file1" != *.class && "$hash1" != "$hash2" ]]; then
echo "Files differ: $file1 and $file2"
fi
else
# Print file path if it doesn't exist in second directory
echo "File does not exist in second directory: $file1"
fi
done
# Check if there are files in the second directory not in the first
find "$dir2" -type f | while read file2; do
# Get the relative path
rel_path="${file2#$dir2}"
# Construct the corresponding file path in the first directory
file1="$dir1$rel_path"
# Print file path if it doesn't exist in first directory
if [ ! -e "$file1" ]; then
echo "File does not exist in first directory: $file2"
fi
done
|
Thanks for the effort, actually, as mentioned above, I don't think the current implementation in Maven is perfect. The goals of this part are:
|
project/CelebornBuild.scala
Outdated
@@ -679,6 +691,8 @@ trait SparkClientProjects { | |||
|
|||
(assembly / assemblyMergeStrategy) := { | |||
case m if m.toLowerCase(Locale.ROOT).endsWith("manifest.mf") => MergeStrategy.discard | |||
case m if m.startsWith("META-INF/license/") => MergeStrategy.discard | |||
case PathList(ps@_*) if Assembly.isLicenseFile(ps.last) => licenseFileOnlyProjectMergeStrategy(ps.mkString(java.io.File.separator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sbtassembly.Assembly#isLicenseFile
/**
* Returns a flag if the given file is a license file
*
* @param fileName the given file to check
* @return flag representing if the file is a license file
*/
def isLicenseFile(fileName: String): Boolean = {
val LicenseFile = """(license|licence|notice|copying)([.]\w+)?$""".r
fileName.toLowerCase match {
case LicenseFile(_, ext) if ext != ".class" => true // DISLIKE
case _ => false
}
}
project/CelebornBuild.scala
Outdated
@@ -205,6 +205,18 @@ object CelebornCommonSettings { | |||
Dependencies.junit % "test", | |||
// https://www.scala-sbt.org/1.x/docs/Testing.html | |||
Dependencies.junitInterface % "test") | |||
|
|||
def licenseFileOnlyProjectMergeStrategy(path: String) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./build/make-distribution.sh --sbt-enabled
META-INF/LICENSE from Project dependency: Project name = celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT, target = META-INF/LICENSE
[Discard] META-INF/NOTICE.txt Not from Project dependency: Jar name = commons-lang3-3.12.0.jar, jar org = org.apache.commons, entry target = META-INF/NOTICE.txt
[Discard] META-INF/LICENSE.txt Not from Project dependency: Jar name = commons-lang3-3.12.0.jar, jar org = org.apache.commons, entry target = META-INF/LICENSE.txt
META-INF/NOTICE from Project dependency: Project name = celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT, target = META-INF/NOTICE
project/CelebornBuild.scala
Outdated
println(s"$path from Project dependency: $dep") | ||
Some(JarEntry(dep.target, dep.stream)) | ||
case dep => | ||
println(s"[Discard] $path Not from Project dependency: $dep") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I roughly remember the sbt has a built-in slf4j logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
assembly - including file: celeborn-client-spark-3-shaded_2.12:META-INF/LICENSE
assembly - excluding file: commons-lang3-3.12.0.jar:META-INF/LICENSE.txt
?
project/CelebornBuild.scala
Outdated
@@ -205,6 +205,18 @@ object CelebornCommonSettings { | |||
Dependencies.junit % "test", | |||
// https://www.scala-sbt.org/1.x/docs/Testing.html | |||
Dependencies.junitInterface % "test") | |||
|
|||
def licenseFileOnlyProjectMergeStrategy(log: Logger) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[info] assembly - excluding file: Jar name = commons-lang3-3.12.0.jar, jar org = org.apache.commons, entry target = META-INF/LICENSE.txt
[info] assembly - excluding file: Jar name = commons-lang3-3.12.0.jar, jar org = org.apache.commons, entry target = META-INF/NOTICE.txt
[info] assembly - including file: META-INF/NOTICE
[info] assembly - including file: META-INF/LICENSE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the built-in strategy MergeStrategy.preferProject
can achieve this goal
MergeStrategy.preferProject
will choose the first project file over library files if present. Otherwise, it works likeMergeStrategy.first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not? If our project does not have the same file
MergeStrategy.preferProject
sbtassembly.MergeStrategy#preferProject
/**
* Picks the dependency that is the first project/internal dependency, otherwise picks the first library/external dependency
*/
val preferProject: MergeStrategy = MergeStrategy("PreferProject") { conflicts =>
val entry = conflicts.find(_.isProjectDependency).getOrElse(conflicts.head)
Right(Vector(JarEntry(entry.target, entry.stream)))
}
use CustomMergeStrategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it feasible to combine using MergeStrategy.preferProject
and MergeStrategy.discard
?
case "LICENSE" | "NOTICE" => MergeStrategy.preferProject
case m if Assembly.isLicenseFile(m) => MergeStrategy.discard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
project/CelebornBuild.scala
Outdated
@@ -870,6 +885,8 @@ trait FlinkClientProjects { | |||
|
|||
(assembly / assemblyMergeStrategy) := { | |||
case m if m.toLowerCase(Locale.ROOT).endsWith("manifest.mf") => MergeStrategy.discard | |||
case m if m.startsWith("META-INF/license/") => MergeStrategy.discard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to discard the files in folder META-INF/license/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MR module depends on the netty 3x version, so there is a META-INF/license/
directory
Check that Spark and Flink do not depend on netty 3x. This configuration is now removed.
https://repo1.maven.org/maven2/io/netty/netty/3.10.6.Final/netty-3.10.6.Final.jar
$ ./build/sbt -Pspark-3.3 celeborn-client-spark-3-shaded/assembly
$ jar tf client-spark/spark-3-shaded/target/scala-2.12/celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT.jar | grep -i 'license\|notice'
META-INF/LICENSE
META-INF/NOTICE
META-INF/licenses/
META-INF/licenses/LICENSE-protobuf.txt |
project/CelebornBuild.scala
Outdated
case m if m == "META-INF/NOTICE.txt" => MergeStrategy.discard | ||
case m if m == "LICENSE.txt" => MergeStrategy.discard | ||
case m if m == "NOTICE.txt" => MergeStrategy.discard | ||
case PathList(ps@_*) if ps.last == "LICENSE" || ps.last == "NOTICE" => MergeStrategy.preferProject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case PathList(ps@_*) if ps.last == "LICENSE" || ps.last == "NOTICE" => MergeStrategy.preferProject | |
case "LICENSE" | "NOTICE" => MergeStrategy.discard | |
case PathList(ps@_*) if ps.last == "LICENSE" || ps.last == "NOTICE" => MergeStrategy.preferProject |
we should discard the NOTICE/LICENSE files from the scala-library
for mr-shade client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ ./build/sbt -Pmr celeborn-client-mr-shaded/assembly
$ jar tf client-mr/mr-shaded/target/scala-2.12/celeborn-client-mr-shaded_2.12-0.4.0-SNAPSHOT.jar | grep -i 'license\|notice'
LICENSE
META-INF/LICENSE
META-INF/NOTICE
META-INF/licenses/
META-INF/licenses/LICENSE-protobuf.txt
NOTICE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this problem.
I have now changed to use preferProject
only for META-INF/LICENSE
and META-INF/NOTICE
.
thanks, merging to main(v0.4.0). |
What changes were proposed in this pull request?
Flink/Spark jars packaged with SBT use the correct LICENSE and NOTICE.
Why are the changes needed?
#1930 (comment)
Does this PR introduce any user-facing change?
How was this patch tested?