-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toolchain: Build pretty. Use xz, not gz. Use sha256, not md5 #10449
Conversation
One or more of the commits in this PR do not match the code submission policy, please check the |
6b4834a
to
6b0f482
Compare
One thing to point out about the use of md5. It's not for any security purpose, just a sanity check. Also sorry about the close/re-open, keyboard slipped on mobile (my bad!) |
In my opinion this could/should rather be split into three commits, since build output doesn't have much to do with the compression algorithm, and the compression algorithm doesn't have much to do with the hash algorithm. It would also make it easier to review and bisect down the line. |
Thanks for the review. These checksums serve two purposes:
For the second purpose, that relies on the checksum algorithm being used not being easily deliberately confused. MD5 collisions are cheap. Bad actors could takeover any server or MITM you, and serve a file with an equal checksum that has a malicious payload. Same could happen with SHA1 since a couple of years ago. SHA256 avoids this for the foreseeable future. There are some good links in that NixOS issue I linked at top (where they migrated from SHA1 to SHA256 for verifying downloaded packages just like we are doing here, but for their whole giant Linux package repository). |
@timschumi Yes, agreed. I have done that work now. Thanks! |
Is there a particular reason for replacing all the |
Yes:
|
In addition to the MD5 vs. SHA-256 discussion: (most) people are blindly trusting the scripts to download, compile and execute the resulting binaries on their systems, so making sure the attack plane for a malicious actor is as small as possible is really worthwhile here. |
The only places where I can see formatting being used is here, while all the other |
Yes, exactly. It seems in other places in Toolchain we are already using SHA256 checks (e.g., BuildPython ) so really BuildIt is the straggler anyway. This just makes it all consistent. Re: gz vs xz. LZMA (xz) seems like a pretty universal win. XZ itself is a tiny package to require (and indeed we already need it for the Clang build) and the file size savings are significant. Many packagers prefer and default to XZ for good reason. It looks like a good portion of the packages in Ports also use xz, so it's good to get it into our build dependencies for that reason as well. Another good thing to do might be to go in and patch all the packages in Ports which are currently using gz and change them to pull xz files where available.
If you are curious about compression algorithms as they relate to packages, this is a good read: Linux OS data compression options: Comparing behavior A quick summary of that: You should prefer XZ for high compression ratio, and ZSTD for quickness (great for things like on-the-fly compression, but not so relevant for archives). zlib/gzip is antiquated and almost universally under-performs on all dimensions for software packaging purposes. |
While I'm thinking about this: Another patch for Ports might be to not assume we have access to GNU tar's compression detection here: Ports/.port_include.sh Relevant: https://unix.stackexchange.com/questions/101561/what-are-the-differences-between-bsdtar-and-gnu-tar |
I converted the rest of the printf variable inclusions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! |
This pull request has been closed because it has not had recent activity. Feel free to re-open if you wish to still contribute these changes. Thank you for your contributions! |
We were already using lzma in the clang build, but not specifying it as a dependency. I fixed the dependency specs and switched the downloads in BuildIt.sh to xz instead of gz. The files are almost half the size. Also added a note about potential slowness of the GNU servers and an alternative (I started this edit because my initial download took nearly an hour off the GNU server).
It is safer to use sha256 than md5. Even sha1 is broken now: NixOS/nixpkgs#77238
I fully integrated the buildstep helper into the BuildIt.sh output. Also, printf = better echo (easier to use formatting and doesn't fork another process). Output formatting made so it is easier to follow the build process in general.