Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental build improvements for generated AssemblyInfo.cs #1007

Conversation

dasMulli
Copy link
Contributor

@dasMulli dasMulli commented Mar 18, 2017

Addresses #967 by:

  1. Splitting CoreGenerateAssemblyInfo into:
  • _CalculateAssemblyAttributes which produces AssemblyAttribute items.
  • _CalculateGeneratedAssemblyInfoFileName which uses the present AssemblyAttribute items to product the name of the generated assembly info file, using the Hash task introduced in Improve incremental build in presence of globs msbuild#1328:
    • VersionPrefix=1.0.0 => obj/Debug/netstandard2.0/lib1.AssemblyInfo.e32ef0.cs
    • VersionPrefix=1.0.1 => obj/Debug/netstandard2.0/lib1.AssemblyInfo.5be252.cs
    • All Generate*AssemblyAttribute set to false => obj/Debug/netstandard2.0/lib1.AssemblyInfo.cs
  1. CoreGenerateAssemblyInfo then uses the generated file name as Outputs and is executed on incremental builds only when any of the AssemblyAttribute items change (even custom ones not generated by the SDK).


<PropertyGroup>
<GeneratedAssemblyInfoFile Condition=" '$(_AssemblyAttributeContentHash)' != '' ">
$(IntermediateOutputPath)$(MSBuildProjectName).AssemblyInfo.$(_AssemblyAttributeContentHash.SubString(0,6))$(DefaultLanguageSourceExtension)
Copy link
Contributor

@nguerrera nguerrera Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate the effort to keep line lengths down, the whitespace is going to be part of the property value, which will affect log readability, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using only 6 chars of hash to avoid long path issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was trying to figure out how much is actually needed to have a high probability of changing but not get too long and ugly. Since this is autogenerated and purely aesthetic (like short git commit hashes), i might was well use the full string.
Any preference?

<PropertyGroup>
<GeneratedAssemblyInfoFile Condition=" '$(_AssemblyAttributeContentHash)' != '' ">
$(IntermediateOutputPath)$(MSBuildProjectName).AssemblyInfo.$(_AssemblyAttributeContentHash.SubString(0,6))$(DefaultLanguageSourceExtension)
</GeneratedAssemblyInfoFile>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess everything chomps input strings that are file paths, but I'd probably rather see this on one long awful line than have the newline+whitespace in the property.

var assemblyPath = Path.Combine(incrementalBuildCommand.GetOutputDirectory(targetFramework).FullName, "HelloWorld.dll");
var info = AssemblyInfo.Get(assemblyPath);

info["AssemblyVersionAttribute"].Should().Be("1.2.4.0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see a check that the previous hashed-name generated AssemblyInfo file is deleted (by IncrementalClean but that is an implementation detail) after changing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rainersigwald thx, restructured the test completely.

@dasMulli dasMulli force-pushed the feature/incremental-assembly-attributes-using-hash branch from aa528ff to 59b01e7 Compare March 20, 2017 21:37
@dasMulli dasMulli force-pushed the feature/incremental-assembly-attributes-using-hash branch from 59b01e7 to f233bec Compare March 20, 2017 21:41
@dasMulli
Copy link
Contributor Author

@dotnet-bot test Ubuntu16.04 Release please

Unable to obtain lock file access on '/tmp/NuGetScratch/lock/f031cb549b4aa7cbc2977507b297c62627fd4faf' 

@nguerrera
Copy link
Contributor

Regarding hash length, I'd prefer if we included all of it just to avoid any question of truncating too aggressively. I am a little concerned about the path length, though. I was thinking we could drop the project name since the file name is already unique for its contents. AssemblyInfo.[hash].cs would be shorter than TemporaryGeneratedFile_[GUID].cs that is dropped in the same directory. But then in the edge case where two libs building in parallel have the same assembly info and the same obj dir (currently blocked by project.assets.json conflict but there is a bug tracking fixing that), one might read it before the other is done writing it. Perhaps we drop AssemblyInfo or shorten it to Info instead, or include the project name (or even full path while we're at it) in the data hashed?

@dasMulli
Copy link
Contributor Author

dasMulli commented Mar 21, 2017

@nguerrera i see your concern. If it doesn't have to be human "understandable" at all, $(MSBuildProjectFullPath) could be added to the hash. Then i can also drop the condition for when there are no items to hash - in which case it falls back to the version without hash which would then collide with other projects using the same name and intermediate output path.

@dasMulli
Copy link
Contributor Author

@nguerrera Seeing that the file name pops up in the compile error when colliding with attributes defined in user code (e.g. trying to add a custom assembly info file or not using the right Generate*Attribute properties), I think the project name should still show up in the file name.
Since the file name shouldn't grow too much, some truncation of the hash will be necessary.
Luckily, the possible entropy shrinks proportionally to how much of the hash is truncated. Since the purpose is not to avoid collisions, but to have some difference in between two inputs, I think the required entropy is extremely low. I haven't done any math on this though.

TL;DR After some thinking I believe the current form of this PR is what should be done, maybe add some more characters to the hash...
Any thoughts?

@AArnott
Copy link
Contributor

AArnott commented Apr 22, 2017

Why do all the complex hashing stuff? Isn't it faster to just regenerate the file on every build? Sort of like package restore does for project.lock.json where it generates it at an alternate location and copies it over the one actually used to compile if its contents are different, thereby preserving a skip-build feature when the file doesn't actually need to change?

@nguerrera
Copy link
Contributor

nguerrera commented May 4, 2017

Isn't it faster to just regenerate the file on every build

That's not obvious to me. I'd like to see measurements. The I/O followed by a full comparison could be slower. @davkean was very concerned about this hitting the disk too much on every design-time build when it was first implemented.

@nguerrera
Copy link
Contributor

On further thought, if we stick with hashing, we should not include a full source file path. The source file name will go in to the PDB and break PDB determinism if it changes only because the local path to the repo is different. (It can also break the assembly determinism if the PDB is embedded.) $(MSBuildProjectFile) is good enough because that's how other things in obj disambiguate themselves. It has never worked to have to projects with the same file name and the same obj directory.

@nguerrera
Copy link
Contributor

Since the purpose is not to avoid collisions, but to have some difference in between two inputs

I don't follow. If there is no difference in the hash fragment between any two inputs, then those two inputs have colliding hash fragments. A collision of the hash fragments means that the wrong assembly info will be used.

@davkean
Copy link
Member

davkean commented May 4, 2017

A while back when I was looking at UI delays, I was seeing a non-trivial amount of tracers where I/O was extremely slooow, probably due to devs opening projects on network shares and the like. You'd hope that %TEMP% was fast (if that's where we'd generate the throwaway) - but if you can avoid hitting the disk that would be best.

@dasMulli
Copy link
Contributor Author

dasMulli commented May 4, 2017

I don't follow.

Sry that explanation of mine was sh*.
I meant to say that the chance of hitting a collision when changing the inputs is much lower than the chance of hitting a collision between two arbitrary inputs. That's why cutting the hash is (probably) appropriate to some degree.
Using the full hash will certainly result in maxpath exceptions for someone - i guess even if it's just [hash].cs - and i don't really like removing the project name from the generated file name, since the compiler may display errors for the file and i think that errors like Duplicate XX in [some-hash].cs are not helpful.. But i'm not very attached to this so @nguerrera it's your call :)

@AArnott
Copy link
Contributor

AArnott commented May 8, 2017

probably due to devs opening projects on network shares and the like. You'd hope that %TEMP% was fast (if that's where we'd generate the throwaway) - but if you can avoid hitting the disk that would be best.

Devs who have projects on networks shares are doomed to a meaningless life of drudgery and boredom. Avoiding I/O during a build to cater to them is ... I mean come on, build is all about I/O.

Making design-time builds fast is goodness, of course.

Copy link
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @nguerrera are we ready to merge this or do we need to think more about the hash length vs path length limits?

@nguerrera
Copy link
Contributor

How about using Base64 to encode the hash? Is that short enough to alleviate file length concerns? I don't want to be in the business of picking how many bits are good enough.

@dasMulli
Copy link
Contributor Author

Base64 is a good idea. It would need 28 characters for a SHA1 hash.
Is there any telemetry about project file paths available?

Another alternative:
Leave output file name alone, but add a stub file that only has the hash as name (where the project name is part of the has).

@nguerrera
Copy link
Contributor

I like "another alternative" a lot. :)

@nguerrera
Copy link
Contributor

Then there's no sign of ugly name in PDBs, error messages, just in a stub file for bookkeeping.

@dasMulli
Copy link
Contributor Author

dasMulli commented May 19, 2017

That stub file could even be project.AssemblyInfo.cs.lock containing the hash using <WriteLinesToFile>'s new WriteOnlyWhenDifferent property ✨
(but I'll need to smoke test if that confuses VS design time build)

@enricosada
Copy link
Contributor

Dont use lock extension on 'nix is used for lock files (things to not delete).

Maybe project.AssemblyInfo.cs.sha1 like nuget does

@nguerrera
Copy link
Contributor

Closing in favor of #1255

@nguerrera nguerrera closed this May 25, 2017
mmitche pushed a commit to mmitche/sdk that referenced this pull request Jun 5, 2020
…015.2 (dotnet#1007)

- Microsoft.DotNet.Arcade.Sdk - 5.0.0-beta.19515.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants