Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bundler version 2 #33413

Merged
merged 6 commits into from
Mar 12, 2020
Merged

Conversation

swaroop-sridhar
Copy link
Contributor

Implement support for single-file version 2 layout as described in https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/bundler.md

The supporting changes are:

  • Implement new HostModel interfaces to communicate additional information from the SDK to the HostModel
    • The SDK changes will be done in an upcoming PR.
    • Some depricated APIs are maintained so that SDK build doesn't break in the meantime]
  • Handle various bundling options described in https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#optional-settings
    • This requires ability to recognize native binaries for various architectures.
    • Added ability to minimaly parse ELF binaries. PE/MachO support already exists.
    • Refactored out PE processing from BinaryUtils, so that PE-ELF-MachO parsers have similar abstractions.
  • Create bundles with the appropriate layout (v1 for netcoreapp3.0, v2 for net5)
  • Consume the new layout from the host bundle process/extraction code.
  • Test cases

Implement support for single-file version 2 layout as described in https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/bundler.md

The supporting changes are:
* Implement new HostModel interfaces to communicate additional information from the SDK to the HostModel
   * The SDK changes will be done in an upcoming PR.
   * Some depricated APIs are maintained so that SDK build doesn't break in the meantime]
* Handle various bundling options described in https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#optional-settings
   * This requires ability to recognize native binaries for various architectures.
   * Added ability to minimaly parse ELF binaries. PE/MachO support already exists.
   * Refactored out PE processing from BinaryUtils, so that PE-ELF-MachO parsers have similar abstractions.
* Create bundles with the appropriate layout (`v1` for `netcoreapp3.0`, `v2` for `net5`)
* Consume the new layout from the host bundle process/extraction code.
* Test cases
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-HostModel Microsoft.NET.HostModel issues label Mar 10, 2020
Some of the AppHost rewriter files imported from the SDK repo
had the license banner in a differnt format.
Make them consistent with rest of the files in the installer partition.
@swaroop-sridhar
Copy link
Contributor Author

CC: @wli3 @dsplaisted

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General concern: The bundler "blindly" uses file extensions and probing to determine the behavior for a file. There doesn't seem to be a way to override this behavior.
Is there a way (combined with SDK and such) to do things like:

  • Include a select native library as a "data file" - so that it's bundled but not extracted
  • Include file with .pdb extension as a "data file" - bundled but not extracted - when combined with "don't bundle symbols" setting

I'm not sure we need to support these scenarios, but I think we should be explicit about whatever behavior we chose. Maybe the guidance for these cases would be to embed such files as managed resources instead?

Comment on lines +10 to +11
/// BundleOptions: Optional settings for configuring the type of files
/// included in the single file bundle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// BundleOptions: Optional settings for configuring the type of files
/// included in the single file bundle.
/// Optional settings for configuring the type of files
/// included in the single file bundle.

DefaultOptions = BundleOptions.BundleAllContent;
break;

case "net5":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the discussion around .NET 5 TFMs ended up (if it already happened), but I vaguely remember that there were mote than one .NET 5 TFM. Will the SDK make sure to always pass "net5" for all such TFMs, or should we support all the other .NET 5 TFMS here?

Copy link
Member

@filipnavara filipnavara Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec and discussion here: dotnet/designs#92

Yes, there are multiple TFMs, it is better to use the pre-parsed identifiers and versions.

Copy link
Contributor Author

@swaroop-sridhar swaroop-sridhar Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote down the framework strings to make it evident that the bundle versions are tied to TFMs. I'll change it to accept the version number only (3, 3.1, 5). This way, the SDK can efficiently pass the TargetFrameworkVersionWithoutV as input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ">= 5" for the last case. :-) I'm sure there is going to be some next version at some point and we want it to default to the new code path.

{
none = 0,
needs_extraction = 1,
netcoreapp3_compat_mode = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't like the compat mode flag. Can't we deduce it from the missing V2 header instead? I find it really weird that we would have the new bundle, but try to mimic the old behavior.
Basically - I would expact behavior like this:

  • 3.* SDK will always use 3.* apphost and also V1 header only
  • 5.* SDK building netcoreapp3.* app will use 3.* apphost and V1 header only
  • 5.* SDK building net5 app will always use 5.* apphost and V2 header

The presence of the flag means that there will be a combination of 5.* apphost used on netcoreapp3.* - I find that wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that apphosts will always match the TFM, exactly as you've noted above, regardless of the SDK.

The netcoreapp3_compat_mode flag is set on a .net5 app, which chooses to build single-file apps in .netcore3.x compat mode, through the IncludeAllContentInSingleFile setting.

This means:

  • All files are bundled into the app; some of them will be extracted to disk.
  • AppContext.BaseDirectory is set to the extraction directory. Please see the associated design and discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, can you please add a comment into the code describing at least roughly what it does?
Do we think we will "get rid of it" in 6 - or deprecate it? I know we can't remove the bit from the struct, but maybe in 6 it would always be false or something. I would very much like to have a strategy how to get rid of it 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly deprecate it in upcoming .net versions when:

  • A static host with linked runtime is available on all platforms, (for self-contained single-file apps). This is not available for Windows in .net5.
  • With the upcoming corssgen2, we'll likely have cross platform native tools available for linking user's native dependencies.

src/installer/corehost/cli/apphost/bundle/header.h Outdated Show resolved Hide resolved
Comment on lines 12 to 23
bool header_fixed_t::is_valid(bool exact_match) const
{
// The AppHost expects the bundle_header to be an exact_match for which it was built.
// The framework accepts backwards compatible header versions.

return num_embedded_files > 0 &&
((major_version < header_t::major_version) ||
(major_version == header_t::major_version && minor_version <= header_t::minor_version));
(exact_match) ?
(major_version == header_t::major_version) && (minor_version == header_t::minor_version) :
((major_version < header_t::major_version) ||
(major_version == header_t::major_version && minor_version <= header_t::minor_version));

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write this in multiple if statements with an early return style? This is starting to get difficult to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please check if the version in the header is either 1 or 2 before any other test, and return false otherwise. Just a sanity check before making any assumptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the check using if() {} else {} blocks.

Also, please check if the version in the header is either 1 or 2 before any other test

I think this check is implicit, because major_version is a uint, that's bounded by header.major_version -- the constant 2.


unsafe public static bool IsElfImage(string filePath)
{
using (var mappedFile = MemoryMappedFile.CreateFromFile(filePath, FileMode.Open, mapName: null, capacity: 0, MemoryMappedFileAccess.Read))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better for this function to use a good ol' read() instead, as you're only interested in reading 4 bytes of a file. You won't probably even need this method to be unsafe also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lpereira, I've fixed this for all three binary utils now.

@swaroop-sridhar
Copy link
Contributor Author

General concern: The bundler "blindly" uses file extensions and probing to determine the behavior for a file. There doesn't seem to be a way to override this behavior.
Is there a way (combined with SDK and such) to do things like:

  • Include a select native library as a "data file" - so that it's bundled but not extracted
  • Include file with .pdb extension as a "data file" - bundled but not extracted - when combined with "don't bundle symbols" setting

I'm not sure we need to support these scenarios, but I think we should be explicit about whatever behavior we chose. Maybe the guidance for these cases would be to embed such files as managed resources instead?

@vitek-karas: For files other than assemblies and some config files, there are only three possible outcomes:

  • Embed them as resources -- as you've recommended above
  • Leave them beside the app -- the SDK today supports the ExcludeFromSingleFile annotation on any file, which can be used to achieve this purpose,
  • Use .net core 3 compatibility mode through IncludeAllContentInSingleFile in which case they are all bundled and extracted to disk, (So, specific type doesn't matter).

We can certainly add support for specifying types in the input FileSpecs but I don't think it is necessary. We can come back to it if there's a customer scenario.

@dsplaisted
Copy link
Member

We've had bugs in the past in cases where we relied on file extensions for this type of thing. I believe the information about how the files are being used is available via the MSBuild item types or metadata in the build, so you should theoretically be able to pass it down.

@swaroop-sridhar
Copy link
Contributor Author

swaroop-sridhar commented Mar 10, 2020

@dsplaisted can you please elaborate about what information is available from meta-data?

We use filename/extension only in the following two cases:

We recognize the app's deps.json/runtimeconfig.json files by name-- which I think is reasonable.
For .pdb files:

  • The SDK recognizes the app's PDB by name. Actually, no processing is necessary for app.pdb in
    the HostModel because the SDK doesn't pass it down at all (when IncludeSymbolsInSingleFile is
    not set).
  • The HostModel recognizes any other .pdb files passed in and excludes them too. I spoke with
    @tmat last year about recognizing PDB files, and he'd recommended using the extension in this
    case.

I think a reasonable approach to take here is for HostModel to not treat symbols files specially at all:

  • The app.pdb case is handled by the SDK already
  • All other PDB files will be treated as content files and left behind -- that is, IncludeSymbolsInSingleFile will have no impact on these.

I think IncludeSymbolsInSingleFile is not a very useful switch; we're just keeping it for compatibility with netcoreapp3 settings.

@vitek-karas
Copy link
Member

I think IncludeSymbolsInSingleFile is not a very useful switch; we're just keeping it for compatibility with netcoreapp3 settings.

While I might not like it - I think we will be surprised:

We have a bug in the trimming (ILLinker) which makes it so that stack traces in exceptions don't get source line info even in debug builds. First time I learned about this I was like "Why do you trim the debug build" - but the answer is basically "We want to do the same everywhere". Another answer I got was - it's not a debug build - "we want source line info in production".

So I would not be surprised to see similar asks for source line info in single-file - which requires pdbs.

@swaroop-sridhar
Copy link
Contributor Author

I think IncludeSymbolsInSingleFile is not a very useful switch; we're just keeping it for compatibility with netcoreapp3 settings.

While I might not like it - I think we will be surprised:

We have a bug in the trimming (ILLinker) which makes it so that stack traces in exceptions don't get source line info even in debug builds. First time I learned about this I was like "Why do you trim the debug build" - but the answer is basically "We want to do the same everywhere". Another answer I got was - it's not a debug build - "we want source line info in production".

So I would not be surprised to see similar asks for source line info in single-file - which requires pdbs.

The recommended solution for this is to build IL with embedded PDBs.
I'd like to see this flag also go away in subsequent releases.

@dsplaisted
Copy link
Member

@dsplaisted can you please elaborate about what information is available from meta-data?

@swaroop-sridhar What you've described sounds reasonable. I think where we ran into trouble before was where we were trying to use a heuristic to determine if an asset was a native library, where the extension varies by platform (and I'm not sure if it's a fixed set).

@swaroop-sridhar
Copy link
Contributor Author

I think IncludeSymbolsInSingleFile is not a very useful switch; we're just keeping it for compatibility with netcoreapp3 settings.

While I might not like it - I think we will be surprised:
We have a bug in the trimming (ILLinker) which makes it so that stack traces in exceptions don't get source line info even in debug builds. First time I learned about this I was like "Why do you trim the debug build" - but the answer is basically "We want to do the same everywhere". Another answer I got was - it's not a debug build - "we want source line info in production".
So I would not be surprised to see similar asks for source line info in single-file - which requires pdbs.

The recommended solution for this is to build IL with embedded PDBs.
I'd like to see this flag also go away in subsequent releases.

I complete agree that we should generate correct PDBs -- but don't know of a case where the PDB should be bundled within the single-file and cannot use IL-embedded-PDBs.

Given this, and the fact that we're trying to move away from extraction, I suggest that we only support IncludeSymbolsInSingleFile when IncludeAllContentInSingleFile (which is netcoreapp3.0 compatibility mode).

@vitek-karas
Copy link
Member

vitek-karas commented Mar 11, 2020

Is there a simple way to tell the build system to include the symbols in the IL? I must admit that this is the first time I've heard about this possibility.

@swaroop-sridhar
Copy link
Contributor Author

Is there a simple way to tell the build system to include the symbols in the IL? I must admit that this is the first time I've heard about this possibility.

I was talking about the csc -debug:embedded ... option, which embeds the PDB within the target .dll. This only works for portable PDBs. If the developer wants to bundle Windows PDBs, then they'll have to fall back to IncludeSymbolsInSingleFile.

@swaroop-sridhar
Copy link
Contributor Author

@lpereira I've addressed your feedback in d844b72. Please let me know if you have any other comments. Thanks.

return false;
}

byte[] eIdent = reader.ReadBytes(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newbie C# question: can you read directly on a struct ElfHeader and just return header.IsValid()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary Reader/Writer doesn't support structs (even explicit layout).
I looked through the docs, but couldn't find a way to do it.

So, I decided to just read the 4 bytes; we anyway didn't have the full header structure coded in this class -- just enough to check the magic bytes in the beginning.

Copy link
Contributor

@lpereira lpereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a question

@swaroop-sridhar swaroop-sridhar merged commit 7f52377 into dotnet:master Mar 12, 2020
swaroop-sridhar added a commit to swaroop-sridhar/sdk that referenced this pull request Mar 12, 2020
Update GenerateBundle task and a few others to use the new APIs in HostModel intruduced by dotnet/runtime#33413.
This change is in preparation for moving to new PublishSingleFile semantics in .net 5.

In GenerateBundle task, the SDK constructs the bundler with `BundleAllContent` option for all builds.
This keeps the behavior of PublishSingleFile unchanged until Host/Runtime components of the feature are ready.
ghost pushed a commit to dotnet/sdk that referenced this pull request Mar 13, 2020
Update GenerateBundle task and a few others to use the new APIs in HostModel intruduced by dotnet/runtime#33413.
This change is in preparation for moving to new PublishSingleFile semantics in .net 5.

In GenerateBundle task, the SDK constructs the bundler with `BundleAllContent` option for all builds.
This keeps the behavior of PublishSingleFile unchanged until Host/Runtime components of the feature are ready.
swaroop-sridhar added a commit to swaroop-sridhar/runtime that referenced this pull request Mar 13, 2020
Some depricated APIs were maintained in dotnet#33413 so that SDK build is green.
Now that the SDK is updated dotnet/sdk#10849, remove the unused APIs.

Also move out a test-only methods from the product to the test helpers.
swaroop-sridhar added a commit that referenced this pull request Mar 13, 2020
Some depricated APIs were maintained in #33413 so that SDK build is green.
Now that the SDK is updated dotnet/sdk#10849, remove the unused APIs.

Also move out a test-only methods from the product to the test helpers.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-HostModel Microsoft.NET.HostModel issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants