Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting custom vmcoreinfo #396

Merged
merged 6 commits into from
May 30, 2024
Merged

Conversation

brenns10
Copy link
Contributor

@brenns10 brenns10 commented May 24, 2024

This branch has sat on the back burner for a while, but new discussion in #350 led me to update it and get it ready to merge. It adds the ability to give drgn some data to be interpreted as the vmcoreinfo when the Program is created. Later, when a core dump is attached, drgn will respect the pre-existing info rather than trying to get it from the core dump.

There's a bit of a delicate dance to be had with libkdumpfile, because it also wants to use the vmcoreinfo to find the page tables. For vmcores missing the vmcoreinfo, it seems that libkdumpfile doesn't know how to find their architecture, so we need to manually provide the architecture, then the dump FD, then the vmcoreinfo.

While I was at it, I threw in RISC-V support for libkdumpfile, which has been available since 0.5.4, but since we have some if-statements that enumerate all the libkdumpfile architectures, we never updated those and thus haven't been able to use it (I think... I don't actually have a RISC-V core dump handy).


Testing this is of course the important part. One obvious candidate where this code makes drgn work is hypervisor core dumps that are missing vmcoreinfo. QEMU will produce them if you don't have -device vmcoreinfo, which drgn already notes and produces a nice error message about.

Thankfully, drgn's own test framework runs QEMU without that option, so it's easy to generate a good test vmcore:

python -m vmtest.vm -k 6.9*
...
# Issue key sequence: Ctrl-A C to switch to the QEMU monitor
(qemu) dump-guest-memory file.elf
## OR
(qemu) dump-guest-memory -z file.kdump.flat

Running drgn against it will fail:

$ drgn -c file.kdump.flat -s build/vmtest/x86_64/kernel-6.9.1-vmtest29.1default/build/vmlinux
drgn 0.0.26 (using Python 3.9.18, elfutils 0.190, with libkdumpfile)
warning: the given file is in the makedumpfile flattened format; if open fails or is too slow, reassemble it with 'makedumpfile -R newfile <oldfile'
Traceback (most recent call last):
  File "/home/stepbren/.local/bin/drgn", line 33, in <module>
    sys.exit(load_entry_point('drgn', 'console_scripts', 'drgn')())
  File "/usr/lib64/python3.9/site-packages/drgn/cli.py", line 270, in _main
    prog.set_core_dump(args.core)
Exception: kdump_vmcoreinfo_raw: linux.vmcoreinfo.raw is not set

$ drgn -c foo.elf -s build/vmtest/x86_64/kernel-6.9.1-vmtest29.1default/build/vmlinux
drgn 0.0.26 (using Python 3.9.18, elfutils 0.190, with libkdumpfile)
error: unrecognized QEMU memory dump; for Linux guests, run QEMU with '-device vmcoreinfo', compile the kernel with CONFIG_FW_CFG_SYSFS and CONFIG_KEXEC, and load the qemu_fw_cfg kernel module before dumping the guest memory (requires Linux >= 4.17 and QEMU >= 2.11)

But with this branch, you can use my dumpphys tool like below to get a text file containing the vmcoreinfo:

dumpphys -c file.kdump.flat -o file.vmcoreinfo -i
## or
dumpphys -c file.elf -o file.vmcoreinfo -i

And then, you can go ahead and run drgn, but add new arguments --vmcoreinfo file.vmcoreinfo to specify the vmcoreinfo, and --architecture X86_64 to specify the architecture (since libkdumpfile cannot detect it). At that point, drgn works like normal.


For ELF vmcores, drgn requires some added bootstrapping/knowledge to find the physical address of the root page table (i.e. swapper_pg_dir). I've implemented this ability within the x86_64 page table iterator. Other architectures will need this support added. libkdumpfile already seems to have this ability for x86_64, so for other architectures it stands to reason that DRGN_USE_LIBKDUMPFILE_FOR_ELF=1 might help them avoid the following error:

>>> prog["slab_caches"]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/stepbren/repos/drgn/drgn/cli.py", line 141, in _displayhook
    text = value.format_(columns=shutil.get_terminal_size((0, 0)).columns)
_drgn.FaultError: recursive address translation; page table may be missing from core dump: 0xffffffff9662aff8

@iostapyshyn
Copy link
Contributor

iostapyshyn commented May 27, 2024

The current known issue with this branch is that this won't work with ELF vmcores that drgn opens natively. Drgn expects to have virtual address mappings provided for it, at least for the linear/direct map. When they're not there, drgn will fail with a recursive page translation error:

The same issue exists with guest memory dumps from QEMU that don't include VM translations. Currently, drgn (without libkdumpfile) works out of the box with QEMU dumps generated with the paging flag (dump-guest-memory -p). With the flag, QEMU performs page table walks and replicates the VM mapping via ELF segments in the dump. This appears to be slow and highly inefficient: In the best case this takes about a minute and the resulting ELF has just a little over 60000 segments, but in some cases the QEMU process hangs indefinitely (I suppose while the guest is executing a process with a huge address space).

The physical memory dumps (without paging) are much faster and work with drgn when DRGN_USE_LIBKDUMPFILE_FOR_ELF=1.

Without libkdumpfile, read_memory_via_pgtable starts from swapper_pg_dir, which is stored in VMCOREINFO as a virtual address. If the core dump doesn't have a translation for swapper_pg_dir, the page table walk will attempt to translate that address again via read_memory_via_pgtable, resulting in recursive address translation fault.

I propose applying the kernel mapping offset to swapper_pg_dir in such cases, which I've implemented in this commit as a PoC for x86_64, but haven't checked other architectures yet. I'll gladly work on a general solution though.

@brenns10
Copy link
Contributor Author

QEMU dumps generated with the paging flag (dump-guest-memory -p)

Ooof, yeah, this option is heavily discouraged due to performance. Also, a malicious guest could cause a bit of a denial of service by finagling the page tables in just the right way so that the page mappings are way too large & complex for QEMU to handle.

But, drgn should also work out-of-the-box with vmcores generated by dump-guest-memory without -p, so long as the host QEMU is started with -device vmcoreinfo and the guest Linux version is > 4.17 configured with CONFIG_FW_CFG_SYSFS and CONFIG_KEXEC (which are pretty universally enabled). See this message. If that's not the case, that's definitely a new bug to investigate. ((Note that DRGN_USE_LIBKDUMPFILE_FOR_ELF=1 may still be required for such ELF vmcores.))

I propose applying the kernel mapping offset to swapper_pg_dir in such cases, which I've implemented in this commit as a PoC for x86_64, but haven't checked other architectures yet. I'll gladly work on a general solution though.

Oops, I hope you didn't spend too long on that since I had a similar (but less polished) workaround in my vmcoreinfo_wip branch. Glad to know we came a similar solution :D

That said, I like your idea of moving the fixes into the page table iterator, that can allow the architecture code to be self-contained.

-			table_physical = false;
+			table_physical = true;

The above is from your changes to the x86_64 pgtable iterator, I think there's a risk that unconditionally setting table_physical = true here may result in incorrect behavior when reading userspace pages, since in that case the page table comes from the pgd field of struct mm which is always a virtual address. You could probably adjust this to only happen when pgtable == swapper_pg_dir, something like this:

diff --git a/libdrgn/arch_x86_64.c b/libdrgn/arch_x86_64.c
index a0a51c36..d7a2a4a4 100644
--- a/libdrgn/arch_x86_64.c
+++ b/libdrgn/arch_x86_64.c
@@ -616,7 +616,10 @@ linux_kernel_pgtable_iterator_next_x86_64(struct drgn_program *prog,
 	for (;; level--) {
 		uint64_t table;
 		bool table_physical;
-		if (level == levels) {
+		if (level == levels && it->it.pgtable == prog->vmcoreinfo.swapper_pg_dir) {
+			table = it->it.pgtable + prog->vmcoreinfo.phys_base - START_KERNEL_MAP;
+			table_physical = true;
+		} else if (level == levels) {
 			table = it->it.pgtable;
 			table_physical = false;
 		} else {

That way the changes are entirely contained within the arch-specific section of the page table code, and we can add similar code for other architectures as we see fit. I've pushed a new commit with this change and added a coauthor tag for you on it.

@brenns10 brenns10 force-pushed the vmcoreinfo branch 2 times, most recently from ea77cc5 to 8d97df1 Compare May 29, 2024 17:26
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I kind of hate that this is necessary, but that's not your fault, and I'm happy with how you handled it. A few minor comments throughout. The last commit also needs a commit message.

libdrgn/python/program.c Outdated Show resolved Hide resolved
_drgn.pyi Show resolved Hide resolved
drgn/cli.py Outdated Show resolved Hide resolved
drgn/cli.py Outdated Show resolved Hide resolved
drgn/cli.py Outdated Show resolved Hide resolved
libdrgn/kdump.c Outdated
attr.val.blob = blob;
ks = kdump_set_attr(ctx, "linux.vmcoreinfo.raw", &attr);
if (ks != KDUMP_OK) {
free(vmcoreinfo);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the libkdumpfile documentation, blob owns vmcoreinfo now, so this would cause kdump_block_decref() to double free, right?

Suggested change
free(vmcoreinfo);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're definitely correct, and it really obviously looks wrong too. Good catch.

More interestingly however is that the libkdumpfile code seems to indicate that even in the error case, the reference to the blob is already stolen, and thus we cannot free it in our error path. See this comment. However, there are clearly cases of errors inside of kdump_set_attr() where the function returns and the blob is not decref'd. In summary: on some failures, we can't even decref the blob. On other failures, we should. And there's no way to tell 😜

So my vote here (I know it's gross) is to heavily comment this and get rid of both the free(vmcoreinfo) and the kdump_blob_decref().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I filed ptesarik/libkdumpfile#81 for Petr, we'll see if he agrees too. Assuming he does, I'll remove the error handling and document the leak .

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kdump_set_attr() is supposed to take ownership, then yes, we should definitely remove kdump_blob_decref(), and I don't think we even need a comment; we just need to report any leaks as a libkdumpfile bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, dropped the comment and the kdump_blob_decref(), we can revisit it if Petr says the ownership semantic is supposed to be different.

libdrgn/drgn_program_parse_vmcoreinfo.inc.strswitch Outdated Show resolved Hide resolved
libdrgn/arch_x86_64.c Outdated Show resolved Hide resolved
libdrgn/arch_x86_64.c Outdated Show resolved Hide resolved
libdrgn/program.h Show resolved Hide resolved
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more things I noticed in this round.

libdrgn/kdump.c Outdated Show resolved Hide resolved
libdrgn/program.c Show resolved Hide resolved
libdrgn/program.c Outdated Show resolved Hide resolved
brenns10 and others added 6 commits May 29, 2024 17:25
Currently set_core_dump() expects to be initializing the vmcoreinfo
itself. But it could be beneficial to let callers set the vmcoreinfo
with something else, e.g. if the vmcoreinfo can't be found in the ELF
notes or kdump metadata, but has been extracted via other means. So
update these initialization steps to only setup vmcoreinfo information
if it's not already present.

Signed-off-by: Stephen Brennan <[email protected]>
Ideally these could be put under some sort of "advanced" settings so
they don't confuse new users.

Signed-off-by: Stephen Brennan <[email protected]>
It's not immediately obvious from the API, but libkdumpfile allows
setting the vmcoreinfo attribute. However, setting the vmcoreinfo is not
enough, we must also set the platform information given by the user.
Further, we need to specify these elements in the correct order with
respect to the file descriptor.

If done correctly, then libkdumpfile can successfully handle a core
whose vmcoreinfo is not present in the diskdump or ELF metadata. Of
course, the user must find the vmcoreinfo note and manually give this to
Drgn, along with the platform architecture.

Signed-off-by: Stephen Brennan <[email protected]>
Most core dumps contain some virtual address mappings: usually at a
minimum, the kernel's direct map is represented in ELF vmcores via a
segment. So normally, drgn can rely on the vmcore to read the virtual
address of swapper_pg_dir. However, some vmcores only contain physical
address information, so when drgn reads memory at swapper_pg_dir, it
needs to first translate that address, thus causing a recursive
translation error like below:

>>> prog["slab_caches"]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/stepbren/repos/drgn/drgn/cli.py", line 141, in _displayhook
    text = value.format_(columns=shutil.get_terminal_size((0, 0)).columns)
_drgn.FaultError: recursive address translation; page table may be missing from core dump: 0xffffffff9662aff8

Debuggers like crash, as well as libkdumpfile, contain fallback code
which can translate swapper_pg_dir in order to bootstrap this address
translation. In fact, the above error does not occur in drgn when using
libkdumpfile. So, let's add this fallback case to drgn as well. Other
architectures will need to have equivalent support added.

Co-authored-by: Illia Ostapyshyn <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
@osandov osandov merged commit 5b39bfb into osandov:main May 30, 2024
33 checks passed
@osandov
Copy link
Owner

osandov commented May 30, 2024

Merged, thanks!

@brenns10 brenns10 deleted the vmcoreinfo branch May 30, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants