-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make packages reproducible #543
base: master
Are you sure you want to change the base?
Conversation
Nice!! A comment in the code linking to that URL may be nice. Will do it later when I am at a computer or if you want to be quicker ;). |
Will do! Since this is something that can easily be forgotten, I was also thinking of adding a test to CI to make sure that packages are reproducible. |
Sure. Nice!
…On July 3, 2017 6:57:58 PM EDT, Jakob Odersky ***@***.***> wrote:
Will do! Since this is something that can easily be forgotten, I was
also thinking of adding a test to CI to make sure that packages are
reproducible.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#543 (comment)
|
Don't include time information in jars in order to make them reproducible. This allows builds to be verified with hash sums. Also see https://reproducible-builds.org/
6c39aa1
to
234b261
Compare
Added a comment about the website and a test |
override def run = { | ||
val pass1 = super.`package`.map(f => hash(f.toPath)) | ||
lib.clean(cleanFiles, true, false, false, false) | ||
transientCache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cache invalidation is required here, or else the second pass will fail due to missing files. Should this be moved to the clean helper method in lib?
Also, I'm not sure if I have the same concept of what a clean
operation should do, but IMO there should not be the need to prompt the user for a confirmation since we're only deleting generated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would probably be safer if we don't try to clear caches from within a single cbt run.
What if in test.scala we call cbt package
, then read the info into memory. Call cbt clean
, then cbt package
, check that the file modified date is different but the jar meta data is still the same.
clean
requiring confirmation is less about not trusting the user having called the right command, more about not trusting cbt as a fast moving code base to not silently break behavior here at some point. Once we get to the point where we have solid, in-memory filesystem test for this, we can remove the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cvogt, should the checksum be generated and checked in the master test file then?
Linking to #502 as it is related |
Have you tested this does not affect incremental compilation @jodersky ? |
Sorry for not addressing your feedback earlier, I'll try to get some time this weekend. Regarding the incremental compilation, I haven't tested this, but I don't think that the compiler would change timestamps within packaged jars. If I install a pre-packaged jar in a read-only location (/usr/share/...), would that mean that incremental compilation of depending sources would break? |
@jodersky To the best of my understanding, the JDK does indeed cache jars using that timestamp information. What I don't actually know is which information it uses. It might be the metadata of the whole jar, or of every jar entry. |
For what operations? Do you have a link?
…On July 28, 2017 4:08:01 PM EDT, Jorge ***@***.***> wrote:
@jodersky To the best of my understanding, the JDK does indeed cache
jars using that timestamp information. What I don't actually know is
which information it uses. It might be the metadata of the whole jar,
or of every jar entry.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#543 (comment)
|
Classloading in general. Not saying it is an issue. It may be. scala/bug#10295 |
Hmm, I'm not quite sure how incremental compilation works here. Does it mean that to check if a source needs recompilation, it checks the last modification date of its dependencies and compares it with a local cache? In case it doesn't use any checksum it should be fairly easy to spoof; I'll give it a shot. The changes proposed here do similar things as this maven plugin https://zlika.github.io/reproducible-build-maven-plugin/ (and also what happens when building a java debian package) |
Jorge, I think that's just scalac. I think the JDK just loads classes once if not loaded, no timestamp involved. But could be wrong.
…On July 28, 2017 4:42:56 PM EDT, Jorge ***@***.***> wrote:
Classloading in general. Not saying it is an issue. It may be.
scala/bug#10295
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#543 (comment)
|
That's my understanding too, at least from briefly going over the linked issue. In that case, how big of deal would this be? Usually, local dependencies use classfiles directly rather than the ones packaged in a jar (this is the |
It doesn't look like something related to Scalac only:
This is the JDK caching jars that are classloaded. In any case, this could be an intricate use case that may not affect this particular change. We also wanted to experiment with this: sbt/io#58. So please let me know if you find any issue in Zinc. I'll fix it. |
This doesn't happen for jars. Jars are sha1'd and then compared. See https://github.com/sbt/zinc/blob/d4d29e8ffeecfa7c6a8e834b4ee54c4bfda9af63/zinc/src/main/scala/sbt/internal/inc/MixedAnalyzingCompiler.scala#L184-L186. But please, do double-check in case I'm wrong. |
thanks for looking into this @jvican. I assume we can move forward with this change then? |
Don't include time information in jars in order to make them
reproducible.
This allows builds to be verified with hash sums.
Also see https://reproducible-builds.org/