-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #233 add append and exclude rules to assembly #309
Conversation
Special thanks to @alexarchambault . Code mostly based on assembly code from coursier |
@mlerner Want to join the review? |
main/src/mill/modules/Jvm.scala
Outdated
|
||
object Assembly { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it right place to have this rules?
For people who want to define their own rules it will be:
import mill.modules.Assembly._
val myRules = Seq(Rule.Append("foo.conf"), Rule.Exclude("bar.txt"))
I found on use-case that won't work in the current implementation. Since we make assembly in two steps: |
310cf99
to
2ead219
Compare
2dfd893
to
ab00a06
Compare
@@ -108,6 +108,8 @@ trait JavaModule extends mill.Module with TaskModule { outer => | |||
} | |||
} | |||
|
|||
def assemblyRules: T[Seq[Assembly.Rule]] = T { Assembly.defaultRules } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep it as Target, or we can go with plain def here? Cause if we can go with def, we can remove serialization code for Assembly.Rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any real preference. If we go with def, we could also include rules that are not serializable, e.g. containing lambdas. I leave that up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make it def, instead of T
, and the users change them while working on their build, will it reevaluate for sure? I tend to think that it will, because we invalidate caches on build change
PR is ready for review. I will add section to docs if the API for assembly rules is fine. |
object core extends HelloWorldModuleWithMain | ||
} | ||
|
||
val akkaHttpDeps = Agg(ivy"com.typesafe.akka::akka-http:10.0.13") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to break up the tests in this file :/ Maybe not in this PR, but if you could do that in a subsequent one that would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you
main/src/mill/modules/Jvm.scala
Outdated
case Some(_:Assembly.Rule.Append) => | ||
append() | ||
case Some(_:Assembly.Rule.Exclude) => | ||
ctx.log.info(s"Excluding entry: ${mapping}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip the logging here; I find the logging from SBT's assembly task more annoying than useful
@@ -257,20 +261,52 @@ object Jvm { | |||
manifest.write(manifestOut) | |||
manifestOut.close() | |||
|
|||
def isSignatureFile(mapping: String): Boolean = | |||
Set("sf", "rsa", "dsa").exists(ext => mapping.toLowerCase.endsWith(s".$ext")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maintain this behavior in the new default assemblyRules
? IIRC we added this because these files were causing problems for someone (there's an issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I added it in default rules actually, right here: https://github.com/lihaoyi/mill/pull/309/files#diff-8987430aa3e4d7f9373fa4b3c3d5f2d0R13
main/src/mill/modules/Jvm.scala
Outdated
@@ -295,25 +331,45 @@ object Jvm { | |||
PathRef(output) | |||
} | |||
|
|||
sealed trait AssemblyEntry { | |||
def mapping: String | |||
def is: InputStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this inputStream
rather than .is
, for better greppability =)
main/src/mill/modules/Jvm.scala
Outdated
} | ||
|
||
concatEntries.foreach { case (mapping, entries) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to unify the Some(_: Assembly.Rule.Append)
logic with the None
logic, rather than having them go down a totally different code path? Apart from complexity, I think this would also re-arrange the order of files in the jar, which while harmless is unnecessary and makes debugging harder (e.g. when you jar tf
two jars to compare contents)
Perhaps we could make the None
case go through concatEntries
as well, just over-writing a key rather than appending to it. If we then make concatEntries
a LinkedHashMap
, that should let us preserve the original ordering of keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is - we process concatEntries
outside the for
comprehension that goes through all the classpath entries. If we process both "write once" and append outside for
that means we have to keep all the files in memory, before we write them. I guess we want to avoid this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rockjam aren't we're only keeping the list of files in concatEntries
, and not the file contents themselves, since each AssemblyEntry
is just a thin wrapper that allows you to open an InputStream
on-demand? As long as we're not keeping their byte contents, and not holding open all their file-handles at the same time, I think keeping the wrappers in memory should be acceptable. e.g. Ammonite only has 16710
file within it's assembly, and even an order of magnitude more
Or we could just leave this as is; I guess a bit of reshuffling isn't the end of the world either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure. I forgot that. It was before this PR - we opened input streams in a for
loop, but now we don't have to. Yes, that sounds reasonable.
@rockjam left a review |
main/src/mill/modules/Jvm.scala
Outdated
case None => | ||
val path = zipFs.getPath(mapping) | ||
if(!excludePatterns.exists(_(mapping))) { | ||
if(appendPatterns.exists(_(mapping))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic is somewhat incorrect; we appear to be checking both for AssemblyEntry#mapping == Entry#mapping
, and also for AssemblyEntry#mapping.asPredicate().test(Entry#mapping)
. We should treat AssemblyEntry#mapping
properly as a regex and only do one test. Otherwise files that don't match the regex, but do match the literal path of the regex, will be unexpected matched (e.g. for a regex hello [world]
which should match hello w
or hello d
but not the entire string hello [world]
)
object core extends HelloWorldModuleWithMain { | ||
def moduleDeps = Seq(model) | ||
|
||
def assemblyRules = T { Seq(Assembly.Rule.Exclude("reference.conf")) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have one unit test for the behavior of anchored regexes? e.g. making sure^reference.conf
matches only reference.conf
and not foo/reference.conf
, and that reference.conf$
matches only foo/reference.conf
but not foo/reference.confetti
Whatever we end up doing, it's probably worth running some basic benchmarks using |
@lihaoyi Updated PR according to your review. Also moved all that append and exclude logic from Jvm to Assembly object - seems to simplify |
@rockjam looks good to me |
…ther than T[Seq[_]]
@lihaoyi By the way, made some primitive benchmarking. Did three runs for each version. Before append rules:
With append rules PR:
So, there is some slowdown introduced by this PR. Do we keep it as is, or it's better to find a way to reduce this slowdown? |
@rockjam those benchmark numbers look fine; a 10%-ish slowdown is still several times faster than SBT due to our layered jar-building. Travis is probably just being flaky; feel free to merge |
This PR introduces ability to define rules for duplicated files in assembly. Currently, there are two rules:
Append
andExclude
TODO: