Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partial objects for DW_OP_piece/DW_OP_bit_piece #322

Open
Tracked by #321
osandov opened this issue Jul 3, 2023 · 0 comments
Open
Tracked by #321

Support partial objects for DW_OP_piece/DW_OP_bit_piece #322

osandov opened this issue Jul 3, 2023 · 0 comments
Labels
debuginfo Support for debugging information formats

Comments

@osandov
Copy link
Owner

osandov commented Jul 3, 2023

Background

DWARF specifies the location of a variable (local or global) with a "location description". See section 2.6 in the DWARF 5 specification. A location description is essentially a series of instructions that, when executed, gives the address or value of the desired variable. drgn has code that evaluates a location description and translates it into a drgn object: see drgn_object_from_dwarf_location() and drgn_eval_dwarf_expression().

DWARF location descriptions have two operations, DW_OP_piece and DW_OP_bit_piece (section 2.6.1.2 in the DWARF 5 spec), that describe a piece of an object instead of the whole object. These can even be used to describe an object whose value is partially known, partially unknown, and/or partially in memory.

Example

Consider the following (contrived) source file compiled with gcc -O2:

#include <stdlib.h>

int main(void)
{
	struct { int a, b; } s = { 1, rand() };
	return s.a + s.b;
}

And the generated assembly code:

0000000000401040 <main>:
  401040:       48 83 ec 08             sub    $0x8,%rsp
  401044:       e8 e7 ff ff ff          call   401030 <rand@plt>
  401049:       48 83 c4 08             add    $0x8,%rsp
  40104d:       83 c0 01                add    $0x1,%eax
  401050:       c3                      ret

Note that s is not actually present in memory. However, its value can still be recovered, as the DWARF information shows:

$ eu-readelf --debug-dump=info a.out
...
 [    be]      variable             abbrev: 8
               name                 (string) "s"
               decl_file            (data1) test.c (1)
               decl_line            (data1) 5
               decl_column          (data1) 23
               type                 (ref4) [    a2]
               location             (sec_offset) location list [    12]
...
$ eu-readelf --debug-dump=loc a.out
...
  Offset: 12, Index: 6
    base_address 0x401040
      0x0000000000401040 <main>
    offset_pair 0, 9
      0x0000000000401040 <main>..
      0x0000000000401048 <main+0x8>
        [ 0] lit1
        [ 1] stack_value
        [ 2] piece 4
        [ 4] piece 4
    offset_pair 9, 10
      0x0000000000401049 <main+0x9>..
      0x000000000040104f <main+0xf>
        [ 0] lit1
        [ 1] stack_value
        [ 2] piece 4
        [ 4] reg0
        [ 5] piece 4
    offset_pair 10, 11
      0x0000000000401050 <main+0x10>..
      0x0000000000401050 <main+0x10>
        [ 0] lit1
        [ 1] stack_value
        [ 2] piece 4
        [ 4] breg0 -1
        [ 6] stack_value
        [ 7] piece 4
    end_of_list
...

s.a is always 1. The lit1, stack_value, piece 4 sequence at the beginning of every location description means that the first 4 byte piece of s has the value 1. s.b varies throughout the function, but the part relevant to this issue is the first address range, 0x401040-0x401048. This range is from the beginning of the function up to and the call rand instruction. The lone piece 4 means that the second 4 byte piece of s has an unknown value. (In the other two address ranges, the value of s.b can be recovered, and the location description defines how to do that.)

Problem Statement

drgn can handle cases of DW_OP_piece/DW_OP_bit_piece where the entire object's value can be recovered. However, for more complicated cases, drgn loses precision and represents the "least common denominator":

drgn/libdrgn/dwarf_info.c

Lines 5056 to 5071 in c69e5b1

/*
* TODO: there are a few cases that a DWARF location can
* describe that can't be represented in drgn's object model:
*
* 1. An object that is partially known and partially unknown.
* 2. An object that is partially in memory and partially a
* value.
* 3. An object that is in memory at non-contiguous addresses.
* 4. A pointer object whose pointer value is not known but
* whose referenced value is known (DW_OP_implicit_pointer).
*
* For case 1, we consider the whole object as absent. For cases
* 2 and 3, we convert the whole object to a value. Case 4 is
* not supported at all. We should add a way to represent all of
* these situations precisely.
*/

(Note that case 4 is #173.) In other words, the second and third address ranges above could be represented exactly by drgn, but the first address range would be returned as entirely unknown even though s.a is known. This is mainly because drgn doesn't have a way to represent a value that is partially known, partially unknown, and/or partially in memory. To support this, we need to:

  1. Extend drgn's object model to support representing pieces of objects.
  2. Represent these cases of DW_OP_piece and DW_OP_bit_piece using that.
@osandov osandov added the debuginfo Support for debugging information formats label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debuginfo Support for debugging information formats
Projects
None yet
Development

No branches or pull requests

1 participant