Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack table symbol resolution uses wrong path for containerized processes #3487

Closed
d3dave opened this issue Jun 13, 2021 · 7 comments
Closed
Assignees

Comments

@d3dave
Copy link

d3dave commented Jun 13, 2021

When calling BPFStackTable::get_stack_symbol(stack_id, pid) with a PID of a containerized process, symbols are resolved incorrectly, if at all. I tracked the issue down to ProcSyms::load_exe where the path to the executable of the process within the container is treated as a path on the host (https://github.com/iovisor/bcc/blob/master/src/cc/bcc_syms.cc#L128). Modifying get_pid_exe to simply return the /proc/<pid>/exe path instead of the read link appears to work:

 std::string get_pid_exe(pid_t pid) {
-  char exe_path[4096];
-  int res;
-
-  std::string exe_link = tfm::format("/proc/%d/exe", pid);
-  res = readlink(exe_link.c_str(), exe_path, sizeof(exe_path));
-  if (res == -1)
-    return "";
-  if (res >= static_cast<int>(sizeof(exe_path)))
-    res = sizeof(exe_path) - 1;
-  exe_path[res] = '\0';
-  return std::string(exe_path);
+  return tfm::format("/proc/%d/exe", pid);
 }

Note this is only relevant for the executable itself and not any loaded modules because _add_module properly handles the enter_ns argument. This is not a 100% proper solution as the result of get_pid_exe is also used by some USDT-related code which appears to rely on the error check of readlink.

@Jongy
Copy link
Contributor

Jongy commented Jun 13, 2021

Alternatively, since current code already assumes pid is running, it can return /proc/%d/root/path/to/exe.

Using /proc/%d/exe has the benefit of being accessible also when the executable was already deleted, so it's a superior solution to /proc/%d/root/ IMO. See a similar fix I made in py-spy to allow profiling "deleted" Python binaries.

@yonghong-song
Copy link
Collaborator

@d3dave Just want to clarify my understanding. In this case, the bcc tool is running inside the container or outside the container? I assume it should be outside the container, right? Why readlink result won't work here? Maybe you can give a step-by-step explanation which will be clear what is the issue?

@davemarchevsky davemarchevsky self-assigned this Jun 15, 2021
@davemarchevsky
Copy link
Collaborator

I think this is a TL;DR of the case @d3dave is describing. Please let me know if I'm not understanding correctly.

When the containerized process is created, it's pulling the exe and libs from paths within the mountns of the container. So map paths in /proc/PID/maps are container-mountns-relative, as is the path returned by readlink on /proc/PID/exe. Some existing code which iterates over all executable maps in /proc/PID/exe is aware of this issue and prepends /proc/PID/root/ to the module path coming from /proc/PID/maps so that a process running in root mount namespace can successfully open the file.

The ProcSyms class in bcc_syms.cc uses this existing code but has some special handling for the main executable that doesn't do this mountns-relative stuff and just tries to open the path coming from get_pid_exe and setup a module symbolizer for the main exe before it even loops through /proc/PID/maps.

I think the presented solution would work. An alternative that may be worth considering would be to stop treating the exe differently and rely on the map iteration code to pick it up when it looks at the first line in /proc/PID/maps. That wouldn't solve the case @Jongy presented, though.

Regardless, this is definitely a bug and since I added a lot of the ns-relative stuff when a similar thing bit me a few years ago I'll work on a fix.

@d3dave
Copy link
Author

d3dave commented Jun 17, 2021

Hey @yonghong-song,

Just want to clarify my understanding. In this case, the bcc tool is running inside the container or outside the container? I assume it should be outside the container, right?

Yes, bcc is running outside the container and is being used to resolve symbols of a pid that is in a container.

Why readlink result won't work here? Maybe you can give a step-by-step explanation which will be clear what is the issue?

readlink returns a path that is relative to the root of the process (/proc/<pid>/root), but bcc reads the symbols from the path as-is, without prepending /proc/<pid>/root. This results in two possible scenarios:

  1. There is no file at that path. In this case things actually turn out ok, because bcc will just load the symbols later via maps.
  2. A file exists at the same path on the host as the path returned by readlink. In this case, the outcome actually depends on a number of things. Mainly whether the file is a valid ELF executable, and its memory layout (as that is what affects the symbol resolution).

Note that in the second case, the file that exists on the same path on the host is not the same file as the file at that path in the container.


Hey @davemarchevsky, thanks for taking your time to explain. I think your explanation is spot on.

An alternative that may be worth considering would be to stop treating the exe differently and rely on the map iteration code to pick it up when it looks at the first line in /proc/PID/maps. That wouldn't solve the case @Jongy presented, though.

I prefer this alternative, only because using /proc/<pid>/exe does not solve the case where other mapped paths are deleted/replaced. The only way to access them that I know of is via map_files. In such a case I see a proper solution as addressing both concerns or deciding to deal with neither.

Regardless, this is definitely a bug and since I added a lot of the ns-relative stuff when a similar thing bit me a few years ago I'll work on a fix.

Thanks!

@yonghong-song
Copy link
Collaborator

@d3dave @davemarchevsky thanks for explanation. Looks like /proc/PID/root/<readlink_path> should work.

@Jongy
Copy link
Contributor

Jongy commented Jun 18, 2021

I agree with @davemarchevsky and @d3dave , the easiest & cleanest thing would be to remove the special treatment of /proc/pid/exe.

Should anyone need to handle the (deleted) case in the future, they can work on the /proc/pid/exe and/or map_files-based solution.

davemarchevsky added a commit to davemarchevsky/bcc that referenced this issue Jun 19, 2021
symbolizing

As reported in iovisor#3487, when `/proc/PID/exe`'s symlink points to a
mountns-relative path from a different mountns than the tracing process,
we can fail to open it as we don't prepend `/proc/PID/root` .

A few potential solutions were discussed in that issue, we settled on
treating the main exe like any other map in `/proc/PID/maps`. Since it's
always the first map we can reuse existing code and get rid of
exe-specific helpers.
@davemarchevsky
Copy link
Collaborator

Should be fixed by #3498

brho pushed a commit to brho/bcc that referenced this issue Nov 3, 2021
symbolizing

As reported in iovisor#3487, when `/proc/PID/exe`'s symlink points to a
mountns-relative path from a different mountns than the tracing process,
we can fail to open it as we don't prepend `/proc/PID/root` .

A few potential solutions were discussed in that issue, we settled on
treating the main exe like any other map in `/proc/PID/maps`. Since it's
always the first map we can reuse existing code and get rid of
exe-specific helpers.
CrackerCat pushed a commit to CrackerCat/bcc that referenced this issue Jul 31, 2024
symbolizing

As reported in iovisor#3487, when `/proc/PID/exe`'s symlink points to a
mountns-relative path from a different mountns than the tracing process,
we can fail to open it as we don't prepend `/proc/PID/root` .

A few potential solutions were discussed in that issue, we settled on
treating the main exe like any other map in `/proc/PID/maps`. Since it's
always the first map we can reuse existing code and get rid of
exe-specific helpers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants