Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable NuGet Pack to be deterministic #6229

Closed
tmat opened this issue Nov 28, 2017 · 16 comments · Fixed by NuGet/NuGet.Client#2989
Closed

Enable NuGet Pack to be deterministic #6229

tmat opened this issue Nov 28, 2017 · 16 comments · Fixed by NuGet/NuGet.Client#2989
Assignees
Labels
Functionality:Pack Type:DCR Design Change Request
Milestone

Comments

@tmat
Copy link

tmat commented Nov 28, 2017

Background
Some environments require build to be deterministic, meaning that each tool involved in the build process must produce exactly the same output given the same inputs. Furthermore the build should not depend on the ambient state of the environment such as the current time, a global random number generator state, the machine name the build is running on, the root directory the repository is built from, etc.

For example, Roslyn compilers support deterministic builds by implementing /deterministic switch. This is also supported by csc/vbc msbuild tasks via property Deterministic.

Issue
NuGet pack is not deterministic in the above sense. I identified several non-deterministic properties, but there might be more:

  1. ZipArchive.CreateEntry API is used for creating zip parts and the call is not followed by setting ZipArchiveEntry.LastWriteTime to a deterministic value. The CreateEntry API initializes this property to the current time. As a result the current time is written to the .nupkg for some parts:

https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L556
https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L770
https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L797
https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L839

  1. PhysicalPackageFile uses File.GetLastWriteTimeUtc or DateTimeOffset.UtcNow to determine what value to set to ZipArchiveEntry.LastWriteTime.

https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PhysicalPackageFile.cs#L82

https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PhysicalPackageFile.cs#L87

https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L576

If this is a required feature there should be a switch to disable it: e.g. adding nuget.exe command line argument /deterministic and respecting msbuild property Deterministic in PackTask.

  1. Random GUIDs are used for named entities written to the .nupkg:

https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L340,
https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L876.

@tmat
Copy link
Author

tmat commented Nov 28, 2017

FYI @jaredpar @jinujoseph

@emgarten
Copy link
Member

emgarten commented Nov 28, 2017

What time should be used? For example when pack creates a new nupkg with a new nuspec, are you saying that shouldn't use the current time? Would a time need to be passed into pack from the user? How does the compiler solve this?

@tmat
Copy link
Author

tmat commented Nov 28, 2017

@emgarten I'm not sure what the right answer is for nuget.

Re Roslyn: We decided to update the PE format specification to a) allow the time-stamp to be an arbitrary number b) add a record to debug directory that indicates the time-stamp is not actually time, so that new tools can handle the value accordingly. We hash the content of the PE file using SHA1. That gives us 20 bytes of deterministic hash derived (indirectly) from the inputs. We use 4 bytes for the time-stamp and 16 bytes for the MVID (module version guid) that also is emitted to a managed assembly. The reason why we use 4B of SHA1 of the content for the time-stamp (and not e.g. 0) is that some (legacy) tools use the time-stamp, the module name and the file size to "uniquely" id a PE file.

I'm not sure what values for the time stamp the zip archive spec exactly allows and what is the time-stamp actually used for in practice. If it can be set to the min allowed value and nothing breaks then we can just do that. If anything breaks that we care about then having /deterministic option, so that we don't break things by default, and following up with maintainers of whatever tools broke would imo be the right approach.

@tmat
Copy link
Author

tmat commented Nov 28, 2017

Re passing time stamp into pack task -- possible but it's just shifting the burden to the user, it doesn't really solve anything. I do not like doing that since most users do not want to spend time reading specifications and wondering what might break if they pass in whatever value they think makes sense.

@jaredpar
Copy link

I'm optimistic that 0 can be used for the timestamp here. Roslyn had to abandon it due to some internal legacy tools which were looking at timestamps specifically to make sure they weren't 0. I doubt NuPkg will have that problem.

@rohit21agrawal
Copy link
Contributor

what's the scenario you are trying to solve that you need this from nuget pack? @jaredpar @tmat

@jaredpar
Copy link

@rohit21agrawal deterministic, reproducible builds. This is the ability to take a given source tree and build it consistently from machine to machine. More detailed info available here: https://reproducible-builds.org/

This is a practice that is becoming standard, and soon to be required, in most Linux distros. Hence it will apply to our source build efforts at some point in the future.

In addition to the correctness benefits it provides, it also allows for build systems to engage in optimizations like build output caching.

Note: even though it's not required at this point, all of the tools involved with our build chain today support determinism: C#, VB and F# compilers, MSBuild, etc ...

@enricosada
Copy link

@jaredpar for reproducible builds you need also the exact list of packages to restore.
maybe ref #5602 if you want to 👍

the deterministic build is a step after.

A common strategy in package managers (ruby bundler, js npm5/yarn, etc) is having a real lock file https://fsprojects.github.io/Paket/lock-file.html to write down the list of packages used at compile time.

@jaredpar
Copy link

jaredpar commented Nov 28, 2017

@enricosada agree. To have a reproducible build all aspects of the build must be deterministic. For this particular issue though we're focusing on the details of the pack command though. That is independent from restore

@Thealexbarney
Copy link

Has there been any more discussion on this? It would be useful to have reproducible builds.

Current builds could workaround the zip timestamps by modifying them after running dotnet pack, but random GUIDs are still used in generating some files.

I've been packing reproducible zip files in some of my builds by setting the timestamps to DateTime.UnixEpoch.

I wouldn't be surprised if changing the zip timestamps in the .nupkg file didn't have an effect on package managers because the timestamps are independent from the PE timestamp, but I don't actually know if NuGet uses those timestamps for anything.

@ctaggart
Copy link

@Thealexbarney, how are you setting the timestamps?

@ctaggart
Copy link

Probably want to support SOURCE_DATE_EPOCH environment variable. and use a sensible default.

https://reproducible-builds.org/docs/source-date-epoch/
aspnet/BuildTools#452

@Thealexbarney
Copy link

@ctaggart It's pretty hacky, but here's an example of making NuGet packages deterministic for simple projects

https://github.com/Thealexbarney/LibHac/blob/master/build/RepackNuget.cs
Zip example

@ctaggart
Copy link

Thanks @Thealexbarney & @tmat. I took a crack at a pull request: NuGet/NuGet.Client#2775

@nkolev92 nkolev92 added this to the Backlog milestone Aug 6, 2019
@nkolev92 nkolev92 self-assigned this Aug 6, 2019
@nkolev92 nkolev92 modified the milestones: Backlog, 5.3 Aug 13, 2019
@rrelyea rrelyea changed the title NuGet Pack is not deterministic Enable NuGet Pack to be deterministic Aug 27, 2019
@ctaggart
Copy link

This is the first github.com link that comes up when searching Bing for "deterministic nuget packages", so I'm going to add a link here for documentation. @clairenovotny tweeted this example, which is great!

https://github.com/clairernovotny/DeterministicBuilds

@Smaug123
Copy link

Same reasoning as above: this is the first github.com link when Google searching "deterministic nuget pack", so here's a link to the latest status #8601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Functionality:Pack Type:DCR Design Change Request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants