-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OMAP-based address translation #17
Comments
Hmm. Dissecting the symbol record by hand and using only the
This agrees with the
I conclude that the PDB really does say that Trying to collate this PDB to that executable, I note that they both have the same GUID – |
It could be, but I've dumped the symbol using another parser (rabin2 from radare2 framework):
Which corresponds to the real offset of the function. Also, I've used IDA forcing to use that PDB and addresses were well recognized. There is something really weird behind the scenes :/ |
Oh. Oh. The DBI stream in this PDB has a Okay, so: taking the The LLVM PDB DBI page says:
…whelp, it's observed now. |
Wow, at some point I was figuring out that there should be an intermediate conversion. Are those structs available through pdb.rs or should I just parse by myself? |
This is all news to me, so
I'm thinking we want an
|
This sounds like there is more work than I thought. Regarding the translation, I thought exactly the same while using PDB: Depending on an external parser over the executable to extract some already (by PDB) available information was weird. I'll be reading about PDB and this project just in case I could help in some way. Thanks for your time and your fast answers. |
This is the first step to fixing #17.
|
This is the first step to fixing #17.
|
Everything is gross, but I am pleased to report that this code -- assert_eq!(pubsym.segment, 0x000c);
assert_eq!(pubsym.offset, 0x0004aeb0);
let addr = sections[pubsym.segment as usize - 1].virtual_address + pubsym.offset;
eprintln!("{:#x} => {:#x}", addr, table.lookup(addr)); -- combines the symbol table entry and original section headers, and passes the result into a binary search on the OMAP table in stream 9 --
-- which ultimately returns the same RVA that you reported upthread. I need to clean this up and pack it into an |
FYI there's code in Breakpad that deals with OMAP tables if you want to compare notes: |
Google's syzygy tool also has code for handling OMAP tables, and a pile of other PDB-reading code that doesn't use the DIA SDK (which might be useful for reference): I believe syzygy can be used to rewrite PE binaries and generate OMAP tables, although I can't find a succinct example. |
OMAP address translation has been released with 0.2.2, this issue can be closed now. |
Do you know if DIA/debughelp.dll uses any other parts of the pdb to do these omap translations? Im going to reference this repo heavily as i recreate the omap streams. We have a binary rewriting/transformation framework and i want to rebuild these omap streams for people so that transformed/obfuscated binaries can still use a pdb to debug. Are there any other components of the pdb involved with omap translation besides these streams?
|
Section/omf map is another one that's used in address translation.
If you want full correctness, you'll likely want to reverse DIA. I don't think that there is any open source code that goes out of its way to do the address translation like Microsoft/DIA does. There are tons of branches and edge cases handled in DIA code, here is a snippet from my personal attempts to do it correctly from a few years ago (with a couple of safety checks removed): std::optional<uint32_t> translate_address( uint32_t segment, uint32_t offset ) {
const auto segment_index = segment - 1;
const auto frame = _segment_frame( segment_index ); // go through section/omf map if present
if ( omap_from ) {
if ( frame ) {
if ( original_section_headers ) {
offset += original_section_headers[frame - 1].virtual_address;
} else {
// use section map or else new section headers
offset += _synthesize_image_offset( segment_index );
}
}
// my/DIA logic differs from PDB crate in OMAP entry search as well.
return _resolve_trough_omap( omap_from, num_omap_from, offset, false );
} else {
if ( frame )
// section map or else 0
offset += _segment_offset( segment_index )
// use new section headers or else section map
+ _synthesize_section_va( section_headers, frame - 1 );
return offset;
}
} Have fun! |
Brutal, is that section headers stream the same as "section headers stream" from the DBIExtraStream? https://llvm.org/docs/PDB/DbiStream.html#optional-debug-header-stream Looks like i have a hot date with IDA... I've never seen anything more over engineered than this file format. Edit: Looks like they are two seperate streams entirely. Virtual insanity. Job security through obscurity.... Anyways... do you know if this section map works like this^? where-in-which logical entries point back into the section map itself to the actual descriptor? Can i just remove all entries in the section map and thus force DIA to use section headers to do address translation or would that break other shit? |
If I were you I would look at syzygy (linked in a previous comment), which does binary rewriting and is already similar to what you're trying to achieve. |
syzygy is indeed a great reference, they have good comments for the omap streams. Sadly i dont think they recreate these streams though. Maybe im wrong but i cant seem to see where they write that information back into the pdb. They have this pdb mutator concept: https://github.com/google/syzygy/blob/master/syzygy/pdb/pdb_mutator.cc and all the mutators are in here: https://github.com/google/syzygy/tree/master/syzygy/pdb/mutators Edit: googles crashpad also has good comments. i think that was linked before. Sadly as @JustasMasiulis mentioned it is indeed true that DIA uses other components of the pdb during translation. Going to go paul walker mode on these components. |
Depends... DIA/PDB has a lot of redundancy and deleting the section/OMF segment map would likely have no impact for 99.99% of binaries and that's kind of evident by the fact that there are a bunch of open-source PDB parsing codebases that all do address translation differently and it kind of works for everyone. I would suggest spending some time reversing the |
After having a very pleasant dinner date with ms Ida I can say that the omap streams are being rebuilt correctly and that DIA can resolve the addresses. Visual studios debugger, x64dbg both display correct symbol information. Also ms Ida has her own PDB parser. For a demo i moved the first function to some padding bytes. I nuked the section map substream, if that becomes a problem later in the future ill have another date with ms Ida. I would just like to take a moment to say thank you to @JustasMasiulis @luser for coming back to this issue 5 years after it was closed. |
There is an error parsing PDB for Windows7 kernel binary, something related to the offset. If I do with a Windows 10 ntoskrnl.exe is OK.
Parsing NtWaitForSingleObject says that is at offset 0x4aeb0 and section C (12), which is wrong. It should say that the offset is 0x000ac8c0. I've tested with other symbols with same success.
Here is the attached files so you can test it:
windows7kernel.zip
I will try to figure out whats hapenning but im not that familiar with the PDB internals.
The text was updated successfully, but these errors were encountered: