-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-module support to offline traces #2062
Comments
|
Can you give me some pointers as to what would be needed to get this working? Also, should we indicate (via warnings etc) that jitted code is not supported in offline traces. Right now the only indication I can see is this message in debug builds:
|
That
That instru_offline comment describes one approach. The memrefs would end up looking like the online ones with full info. |
Looks like instru_offline.cpp does not have any logging at all. I have changed the verbosity in the post-processor log from 3 to 1. The change suggested in the instru_offline comment would only provide the instruction length, and only for up to 10 instructions -- is that correct? If we wanted to get the opcodes, would we have to do something similar to what is done for vdso i.e. save the code to disk and use that for post-processing? |
I think it means there would be a separate entry for each set of 10 instructions. (If we wanted to only allow 10 that is also doable by splitting the blocks.)
Right, the proposal in the comment is only for enough info for cache simulation. For core simulation you would have to either include opcode (or some simulators would want the full encoding?) info with each instruction or as you said save the code separately, but saving the code gets tricky if it's actively modified and different instructions occupy the same addresses at different times. A hybrid approach that saves the code separately just once for written-once code combined with per-entry encoding info for multi-write code might be worth the effort. Combining with hints from the JIT (as in https://dynamorio.org/page_jitopt.html) would be best where possible. |
Currently offline traces do not support code that is not in a library (e.g. JIT compiled code). Increased the log severity from 3 to 1 so that it is more easily noticeable. Issue: #2062
Do you have a sense of how common this is? Regarding saving code, for the vdso case we already have a module, but for jitted code I think we will need to add a new module entry. What do you think? I didn't see a drmodtrack API that we could use. Would be need to add this support? |
This can happen in multiple scenarios, such as:
An application unloading one library and loading a new one at an overlapping address is handled by drmodtrack via a new module entry (but core DR has to treat it like modified code). As for how common it is: I don't know how common for say today's most popular Java VM's as I have not looked at Java in a long time.
If we had the annotations pointed at above and knew that a memory region was a JIT region and was only appended to and never modified in the middle, this makes sense: add an entry, save the contents at the end (using the assumption of no changes). |
Is there a way to detect modified code using either DR or other tools? We are trying the approach of saving code to a file. This code gets saved in a raw binary format but during post-processing I think the module loader expects an executable format like ELF, so dr_map_executable_file fails. Any suggestions on how to handle this? |
Xref #409 on DR providing an event when it flushes code due to modifications. DR's two-pronged scheme of page protection and sandboxing is described in "Maintaining Consistency and Bounding Capacity of Software Code Caches" linked at https://dynamorio.org/page_publications.html.
If the data were embedded like for VDSO it would be plain-mmapped with the rest of the file and would not need any new code to handle. If it's separate it could be written with executable file headers; or custom_module_data_t could be expanded; or maybe drmodtrack itself should get in the game and provide something, which would help other uses like drcov -- but drcov's use case mapping it back to sources is murky. Not sure what the best thing is; various other issues as well such as how often storing the code works out (i.e., how often generated code address ranges are unchanged after first execution across the whole run for typical JITs). |
Adds a drmemtrace test which generates code outside of any module. Adds a proper warning about this scenario at level 0 (with a do-once to avoid spew) to both the tracer and raw2trace (previously it would silently be traced without any indication of a problem, and the raw2trace warning was at level 1). Issue: #2062
Adds a drmemtrace test which generates code outside of any module. Adds a proper warning about this scenario at level 0 (with a do-once to avoid spew) to both the tracer and raw2trace (previously it would silently be traced without any indication of a problem, and the raw2trace warning was at level 1). Updates the clang-format used in GA CI to version 12 to fix a bug where version 9 failed to put a function type on its own line. It is harder to get version 9 on recent development machines so it is good to upgrade. Issue: #2062
Augments drmemtrace to emit the instruction encodings for non-module code while tracing. The instru_offline_t class emits them into a global buffer and from there into a new file in the raw/ subdirectory. Encodings are obtained by encoding each app or emulated instruction in the block. Adds a new drmemtrace interface function drmemtrace_get_encoding_path() to obtain the path to the file where the encodings are stored. Adds a sanity check that the file exists to the gencode test. The file reader and raw2trace_directory_t are updated to be aware of the encoding file. Raw2trace support for parsing the encoding file will be added in a separate commit to keep things small and incremental. Issue: #2062
) Augments drmemtrace to emit the instruction encodings for non-module code while tracing. The instru_offline_t class emits them into a global buffer and from there into a new file in the raw/ subdirectory. Encodings are obtained by encoding each app or emulated instruction in the block. Increases the offline file version to make it possible to know whether to expect an encoding file or not. Adds a new drmemtrace interface function drmemtrace_get_encoding_path() to obtain the path to the file where the encodings are stored. Adds a sanity check that the file exists to the gencode test. The file reader and raw2trace_directory_t are updated to be aware of the encoding file. Raw2trace support for parsing the encoding file will be added in a separate commit to keep things small and incremental. Issue: #2062
Adds raw2trace parsing of the encoding file used by the tracer to store instruction encodings for generated code. This involves the following changes: + Adds encoding file parsing to module_mapper_t. + Changes module map queries to use new module_mapper_t interfaces instead, which handle generated code. + Changes block lookup to use the modidx,modoffs pair as the key rather than the absolute pc. The changes are compatibility-breaking for raw2trace_t which now takes an encoding file parameter in the middle of existing parameters. Updates existing uses. For module_mapper_t the encoding file is added last with a default value to preserve compatibility for existing analysis tools like opcode_mix and view. It is assumed that encodings for generated code will be added to the final trace file and thus these tools will not need a module_mapper_t interface for generated code. Augments the tool.drcacheoff.gencode test to post-process the trace and ensure the generated code PC is observed. Fixes a -loglevel 4 signal dump_unmaksed() crash on detach i#5618 hit in the gencode test; confirmed the test is crash-free at loglevel 4 with the fix. Issue: #2062 Fixes #5618
) Adds raw2trace parsing of the encoding file used by the tracer to store instruction encodings for generated code. This involves the following changes: + Adds encoding file parsing to module_mapper_t. + Changes module map queries to use new module_mapper_t interfaces instead, which handle generated code. + Changes block lookup to use the modidx,modoffs pair as the key rather than the absolute pc. This runs into problems on 32-bit where the hashtable_t key is limited to pointer-sized. To solve this, on 32-bit we use unordered_map, via a wrapper class block_hashtable_t to abstract away the differences. The changes are compatibility-breaking for raw2trace_t which now takes an encoding file parameter in the middle of existing parameters. Updates existing uses. For module_mapper_t the encoding file is added last with a default value to preserve compatibility for existing analysis tools like opcode_mix and view. It is assumed that encodings for generated code will be added to the final trace file and thus these tools will not need a module_mapper_t interface for generated code. Augments the tool.drcacheoff.gencode test to post-process the trace and ensure the generated code PC is observed. Fixes a -loglevel 4 signal dump_unmaksed() crash on detach i#5618 hit in the gencode test; confirmed the test is crash-free at loglevel 4 with the fix. Issue: #2062 Fixes #5618
) Augments drmemtrace to emit the instruction encodings for non-module code while tracing. The instru_offline_t class emits them into a global buffer and from there into a new file in the raw/ subdirectory. Encodings are obtained by encoding each app or emulated instruction in the block. Increases the offline file version to make it possible to know whether to expect an encoding file or not. Adds a new drmemtrace interface function drmemtrace_get_encoding_path() to obtain the path to the file where the encodings are stored. Adds a sanity check that the file exists to the gencode test. The file reader and raw2trace_directory_t are updated to be aware of the encoding file. Raw2trace support for parsing the encoding file will be added in a separate commit to keep things small and incremental. Issue: #2062
) Adds raw2trace parsing of the encoding file used by the tracer to store instruction encodings for generated code. This involves the following changes: + Adds encoding file parsing to module_mapper_t. + Changes module map queries to use new module_mapper_t interfaces instead, which handle generated code. + Changes block lookup to use the modidx,modoffs pair as the key rather than the absolute pc. This runs into problems on 32-bit where the hashtable_t key is limited to pointer-sized. To solve this, on 32-bit we use unordered_map, via a wrapper class block_hashtable_t to abstract away the differences. The changes are compatibility-breaking for raw2trace_t which now takes an encoding file parameter in the middle of existing parameters. Updates existing uses. For module_mapper_t the encoding file is added last with a default value to preserve compatibility for existing analysis tools like opcode_mix and view. It is assumed that encodings for generated code will be added to the final trace file and thus these tools will not need a module_mapper_t interface for generated code. Augments the tool.drcacheoff.gencode test to post-process the trace and ensure the generated code PC is observed. Fixes a -loglevel 4 signal dump_unmaksed() crash on detach i#5618 hit in the gencode test; confirmed the test is crash-free at loglevel 4 with the fix. Issue: #2062 Fixes #5618
Adds a new trace_entry_t type TRACE_TYPE_ENCODING. Encodings are stored in one or more consecutive such entries by raw2trace for the first occurrence of each instruction or modified instruction, with encodings repeated in new chunks. The trace version is bumped for this new type. Adds a new encoding field to the end of memref_instr_t. The reader_t class caches the trace_entry_t encodings and uses them to fill in this field. A new file type OFFLINE_FILE_TYPE_ENCODINGS indicates whether encodings are present (to support stripping them out). Augments the opcode_mix and view tools to use the new encoding records and only need the module_mapper for legacy traces. Updates the documentation and unit tests. Still remaining is adding markers when code changes, joint with #2062. Fixes #5520
Adds a new trace_entry_t type TRACE_TYPE_ENCODING. Encodings are stored in one or more consecutive such entries by raw2trace for the first occurrence of each instruction or modified instruction, with encodings repeated in new chunks. The trace version is bumped for this new type. Adds a new encoding field to the end of memref_instr_t. The reader_t class caches the trace_entry_t encodings and uses them to fill in this field. A new file type OFFLINE_FILE_TYPE_ENCODINGS indicates whether encodings are present (to support stripping them out). Augments the opcode_mix and view tools to use the new encoding records and only need the module_mapper for legacy traces. Updates the documentation and unit tests. Still remaining is adding markers when code changes, joint with #2062. Fixes #5520
Changes the kernel interruption PC for 64-bit from a modidx+modoffs scheme that was trying to avoid a 2nd record but which failed to handle a non-module PC to use the absolute PC (a recently added assert fires in this case; previously we could assert or crash or continue with a bogus value in raw2trace depending on the uninitialized value of modidx). Bumps the raw offline version number. Updates raw2trace to handle both the old version as modix+modoffs plus the new absolute PC version. Adds new unit tests for both. Adds a SIGILL to the burst_gencode trace which triggers the new tracer assert and passes with this fix. To build the test, adds 0-valued entries to operand enums: DR_EXTEND_DEFAULT and DR_OPND_NONE, to avoid C++ compiler warnings in INSTR_CREATE_dc_ivac(). #6000 covers using those in all the AArch64 creation macros. Issue: #2062
Changes the kernel interruption PC for 64-bit from a modidx+modoffs scheme that was trying to avoid a 2nd record but which failed to handle a non-module PC to use the absolute PC (a recently added assert fires in this case; previously we could assert or crash or continue with a bogus value in raw2trace depending on the uninitialized value of modidx). Bumps the raw offline version number. Updates raw2trace to handle both the old version as modix+modoffs plus the new absolute PC version. Adds new unit tests for both. Adds a SIGILL to the burst_gencode trace which triggers the new tracer assert and passes with this fix. To build the test, adds 0-valued entries to operand enums: DR_EXTEND_DEFAULT and DR_OPND_DEFAULT, to avoid C++ compiler warnings in INSTR_CREATE_dc_ivac(). #6000 covers using those in all the AArch64 creation macros. Issue: #2062
Removes the vdso raw bytes we were storing in the module file for offline drmemtraces. Switches to using per-block encodings instead. This avoids problems with hooked vsysenter on 32-bit AMD. Tested on tool.drcacheoff.simple on 32-bit AMD on a machine where that test failed every time before this fix. Removes the unused offline_instru_t::get_modoffs() rather than updating it for the vdso change. Issue: #6416, #2062 Fixes #6416
Adds support for instrumenting non-module code instr-by-instr instead of whole-block-at-a-time. This is required for the L0 filter mode which instruments individual instructions rather than one pc entry for the whole block. Under the new scheme, the modoffs field in each trace PC entry point to each non-mod instrs separately, rather than just the top of the non-mod block. Specifically, we store the cumulative encoding length of instrs stored to the encoding file prior to the recorded instr. This may potentially allow too few gencode blocks for JIT apps (we'd support 8G of JIT code); we could use multiple modidx to point to gencode (growing downward from PC_MODIDX_INVALID) to help with that in the future. Added a TODO. Adds file type to the encoding file header. Adds a new encoding file version to indicate the presence of file type. Adds a new file type bit that denotes that the trace was filtered and that the module_mapper_t should interpret the modoffs as pointing to a single instr as described above. When this bit is not set, we use the existing scheme of interpreting the pc modoffs as just the non-mod block-idx. Note that even under the new scheme, we still write the encoding file one mon-module block at a time because it is too inefficient to write one encoding_entry_t for just one non-module instr. Modifies record_instr_encodings to skip writing anything to the encoding file for blocks without any app instr. Adds a new variant of the tool.drcacheoff.gencode test that runs with the L0_filter enabled. Modifies the test to add a sequence of instrs that, without this fix, produces an error in raw2trace due to an apparently out-of-block memref. Issue: #2062
Split from #1729
What about DGC from a JIT, or vsyscall?
We need to store the type (read/write/prefetch*) and size for a memref, and
the size of every single ifetch.
For memrefs, we may have to use double entries, with the first one having
an escape type and the address and the second having the full type and
size.
For ifetch, perhaps we can fit multiple sizes in one entry. Modidx will be
a sentinel, so we have the 45 modoffs bits. We will want 4 bits for each
instr size (ok to assume 16-byte max instr, even though technically there
can be 17 and an illegal instr could be even bigger) so we can fit the
sizes of 11 instrs there. We'll zero the rest.
The vdso shows up as a module but is harder to decode.
We have two choices for vdso:
to look for "[vdso]" which will require changing drmodtrack.
decode from its own vdso via special code in raw2trace.
The text was updated successfully, but these errors were encountered: