Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGEN-generated PDB support #153

Open
vvuk opened this issue May 2, 2024 · 8 comments
Open

NGEN-generated PDB support #153

vvuk opened this issue May 2, 2024 · 8 comments

Comments

@vvuk
Copy link

vvuk commented May 2, 2024

I'm working to improve some profiling tools (samply specifically) that uses the pdb create under the hood. Part of what I need is being able to handle symbols for .NET, specifically symbols from Crossgen2-built Ready 2 Run assemblies. I think that's where things are coming from anyway, anyway -- things that end in .ni.pdb, I think written here with some comments about DiaSymReader and other: https://github.com/dotnet/runtime/blob/fc76b1cac3f02cc9729f6682d6850fd7982e9fe5/src/coreclr/tools/aot/ILCompiler.Diagnostics/PdbWriter.cs#L199

Here's an example of this type of PDB, from Microsoft's symbol server: dotnet.ni.dll Also just in case, this isn't a Portable PDB, it's a normal PDB, but I think written in a very limited way. It's just the symbol information.

When read by the pdb create, these pdbs show up as having no section information. Which means it can't get an address map, which means that I end up with no way of translating RVA addresses to symbols. Section contribution information is there though, e.g. here's dia2dump -x:

*** SECTION CONTRIBUTION

    RVA        Address       Size    Module
  00001000  0001:00000000  00275000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll
  00276000  0002:00000000  00034000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll
  002AA000  0003:00000000  00004000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll

These section contributions map directly to the 3 sections in the actual code dotnet.dll. I have no idea where dia2dump is getting the RVA from above, as it's not in the section contrib information. I do see in PdbWriter.cs some places where sections are written, but I have no idea where that info is going!

In case it's useful, there's one module in this PDB (again from dia2dump):

** Module: C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll

CompilandDetails:
        Language: MSIL
        Target processor: ARM64
        Compiled for edit and continue: no
        Compiled without debugging info: no
        Compiled with LTCG: no
        Compiled with /bzalign: no
        Managed code present: no
        Compiled with /GS: no
        Compiled with /sdl: no
        Compiled with /hotpatch: no
        Converted by CVTCIL: no
        MSIL module: no
        Frontend Version: Major = 0, Minor = 0, Build = 0, QFE = 0
        Backend Version: Major = 8, Minor = 0, Build = 424, QFE = 16909
        Version string: Crossgen2 - 8.0.4+2d7eea252964e69be94cb9c847b371b23e4dd470
@vvuk
Copy link
Author

vvuk commented May 2, 2024

DBIExtraStreams from pdb.extra_streams() is just full of None here.

@JustasMasiulis
Copy link

I have no idea where dia2dump is getting the RVA from above

#17 (comment)

DIA uses the section map/OMF segment map (same thing, different names in different sources) to aid the translation. Section headers are not necessary to do the translation and this library simply doesn't implement the address translation this way.

@vvuk
Copy link
Author

vvuk commented May 2, 2024

Ah ha! I just made my way there, but was trying to figure out how to use that data. Sounds like I'm on the right track, at least for a limited use case.

@vvuk
Copy link
Author

vvuk commented May 2, 2024

Hmm, could maybe use another hint here @JustasMasiulis :) In this PDB, there isn't any omap data. So all I've got is the section_map.

DebugInformation { stream: Stream { source_view: ReadView(421 bytes) }, header:
 DBIHeader { signature: 4294967295, version: V70,
    age: 1, gs_symbols_stream: StreamIndex(8), internal_version: 36390,
    ps_symbols_stream: StreamIndex(9), pdb_dll_build_version: 33135,
    symbol_records_stream: StreamIndex(10), pdb_dll_rbld_version: 0,
    module_list_size: 140, section_contribution_size: 88,
    section_map_size: 84,
    file_info_size: 20, type_server_map_size: 0, mfc_type_server_index: 0, debug_header_size: 0, ec_substream_size: 25, flags: 0, machine_type: 0, reserved: 0 }, header_len: 64 }
// debug_header_size is 0, but just in case:
DBIExtraStreams { fpo: StreamIndex(None), exception: StreamIndex(None), fixup: StreamIndex(None), omap_to_src: StreamIndex(None), omap_from_src: StreamIndex(None), section_headers: StreamIndex(None), token_rid_map: StreamIndex(None), xdata: StreamIndex(None), pdata: StreamIndex(None), framedata: StreamIndex(None), original_section_headers: StreamIndex(None) }

if I parse the section_map as an OMFSegMapDesc (roughly from microsoft-pdb), I get this:

sec_count: 4, sec_count_log: 4
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 1, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 2576384 }
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 2, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 212992 }
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 3, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 16384 }
OMFSegMapDesc { flags: 520, ovl: 0, group: 0, frame: 0, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 4294967295 }

If I parse it as a DbiSectionMap from syzygy I get:

DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 1, unknown_data_2: 4294967295, rva_offset: 0, section_length: 2576384 }
DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 2, unknown_data_2: 4294967295, rva_offset: 0, section_length: 212992 }
DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 3, unknown_data_2: 4294967295, rva_offset: 0, section_length: 16384 }
DBISectionMapItem { flags: 8, section_type: 2, unknown_data_1: 0, section_number: 0, unknown_data_2: 4294967295, rva_offset: 0, section_length: 4294967295 }

DbiSectionMap packs flags/section_type into the 16-bit flags OMFSegMapDesc field, ok. But rva_offset is still 0 here. What am I missing?

@JustasMasiulis
Copy link

JustasMasiulis commented May 2, 2024

But rva_offset is still 0 here.

That is correct and this value is used as it is.

Hmm, could maybe use another hint here

For your specific PDB the segment frame is always 1, so there will be no section RVA "synthesis" (which is needed when there are no section headers) beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset

@vvuk
Copy link
Author

vvuk commented May 2, 2024

For your specific PDB the segment frame is always 1,

Hm how do I know this? (and apologies, I'm still figuring out all the PDB details, so I'm not 100% familiar what the "segment frame" is -- equivalent to the section here? And thank you for your help!)

beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset
Ok, so 0x1000 is assumed if there is no other information (+ the rva_offset from the section map)? What about the other two section map entries?

All the public symbols do fit within the first section's range, so moot point here, but e.g. where is e.g. 002AA000 coming from for the third entry in the contributions map?

I hacked in a version of this in the crate that turns out I'm actually using (so many pdbs) in samply; thanks for your help.

@vvuk
Copy link
Author

vvuk commented May 2, 2024

(Also to be clear, happy to do a PR for this upstream version of the crate as well if there's interest)

@JustasMasiulis
Copy link

JustasMasiulis commented May 2, 2024

For your specific PDB the segment frame is always 1,

Hm how do I know this? (and apologies, I'm still figuring out all the PDB details, so I'm not 100% familiar what the "segment frame" is -- equivalent to the section here? And thank you for your help!)

OMFSegMapDesc.frame from one of your previous samples. I wasn't clear about this, but I was looking only at symbols and their address translation.

beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset
Ok, so 0x1000 is assumed if there is no other information (+ the rva_offset from the section map)? What about the other two section map entries?

All the public symbols do fit within the first section's range, so moot point here, but e.g. where is e.g. 002AA000 coming from for the third entry in the contributions map?

Both the second and third entries refer to frame > 1 and need extra work beyond just adding 0x1000 to synthesize. You need to add sum of sizes of preceding OMFSegMapDesc entries.

vvuk added a commit to vvuk/pdb2 that referenced this issue May 22, 2024
vvuk added a commit to vvuk/pdb2 that referenced this issue May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants