Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WaitStatus::PtraceSyscall for use with PTRACE_O_TRACESYSGOOD #566

Merged
merged 2 commits into from
Jul 21, 2017

Conversation

geofft
Copy link
Contributor

@geofft geofft commented Mar 28, 2017

The recommended way to trace syscalls with ptrace is to set the
PTRACE_O_TRACESYSGOOD option, to distinguish syscall stops from
receiving an actual SIGTRAP. In C, this would cause WSTOPSIG to return
SIGTRAP | 0x80, but nix wants to parse that as an actual signal.

Add another wait status type for syscall stops (in the language of the
ptrace(2) manpage, "PTRACE_EVENT stops" and "Syscall-stops" are
different things), and mask out bit 0x80 from signals before trying to
parse it.

Closes #550

@geofft
Copy link
Contributor Author

geofft commented Mar 28, 2017

r? @kamalmarhubi
cc @chaosagent, this supersedes #549

I've tested this locally with an (in-progress) syscall tracer, on Linux x86-64. I think adding this as a separate type to WaitStatus is most in the spirit of how the ptrace manpage thinks about things.

Also, maybe these two WaitStatus types should be available on all platforms, just unused? They don't use any types that aren't available cross-platform. It seems like it would be annoying to have cross-platform code that may or may not exhaustively match the enum depending on which platform you're compiling it for.

@kamalmarhubi
Copy link
Member

@chaosagent can you take a look and see if this fits your need?

They don't use any types that aren't available cross-platform. It seems like it would be annoying to have cross-platform code that may or may not exhaustively match the enum depending on which platform you're compiling it for.

I'd prefer we be explicit about this. Someone writing cross-platform code can have a catchall match arm if they need to. The compiler error should be really clear, too.

Overall I'm just not familiar enough with ptrace to know what makes sense here. It'll take me some time to read man pages and ideally find some C code that uses this feature.

@nix-rust/nix-maintainers anyone else familiar enough with ptrace to look at this?

@geofft
Copy link
Contributor Author

geofft commented Apr 5, 2017

Overall I'm just not familiar enough with ptrace to know what makes sense here. It'll take me some time to read man pages and ideally find some C code that uses this feature.

The project this (and the other PR) is for is https://github.com/geofft/gtrace, a Rust implementation of strace. It's still very much in progress but it works well enough to decode a half-dozen common system calls. See in particular the implementation of Tracee::step for a match arm on WaitStatus. Also there are a few links in the README worth reading up on for background on ptrace. nelhage's blog post is a good overview of a simplistic strace in C (no decoding, it just tells you the numerical syscall and argument values). See its wait_for_syscall function, which dispatches on the return value of waitpid using the C WIF... macros.

I'm more curious about code that doesn't care about ptrace that exhaustively matches on WaitStatus variants. I think it will be more confusing to have variants exist on some platforms and not others, and I don't really see a downside to making it exist on all platforms and adding some comments about how you'll only get these wait statuses on Linux.

Also, to be fair, it's pretty well-acknowledged that ptrace overloading wait status is a bad design. The Linux manpage says "The ptrace API (ab)uses the standard UNIX parent/child signaling over waitpid(2)," and the original BSD manpage is even harsher:

ptrace() is unique and arcane; it should be replaced with a special file which can be opened and read and written. The control functions could then be implemented with ioctl(2) calls on this file. This would be simpler to understand and have much higher performance.

Unfortunately, as you probably already know, UNIX.

@Susurrus
Copy link
Contributor

@geofft Can you rebase this?

@geofft
Copy link
Contributor Author

geofft commented Apr 17, 2017

Rebased. Should I remove the #[cfg] on the Linux-specific variants? I think that's where the discussion was leaning.

@Susurrus
Copy link
Contributor

@geofft I haven't really followed this PR, just saw it needed a rebase to get run on our new CI stack. But I think I would suggest we keep those constants Linux/Android-specific. It is "technically" wrong to expose them on platforms where they aren't supported, and since nix aims to be as thin as possible of an API over libc, I'd suggest we keep it in. If it becomes so onerous for downstream consumers of this API, they can file an issue and we'll revisit it. @kamalmarhubi Any comments from you here?

@geofft
Copy link
Contributor Author

geofft commented Apr 18, 2017

My argument is that these aren't constants, these are enum variants, and it seems good for user code that matches on a WaitStatus to compile on any nix-supported system, regardless of whether it was originally written on Linux. There's no API (on any platform) that accepts a WaitStatus, just APIs that return them. So having an enum variant that is never going to get constructed is, I think, harmless.

@Susurrus
Copy link
Contributor

Yes, you're correct these are enum variables, but my point still stands: these types are only valid on specific platforms. By exposing them on other platforms you have fake types where they aren't necessary. You can still write cross-platform code using this as #cfg[] works on match arms.

From my viewpoint exposing these types on all platforms is "technically" wrong, can be easily handled by client code with platform-specific match arms, and could leave to confusion as clients might try to match on values never returned on their platform. Therefore I suggest we leave it as is.

@geofft
Copy link
Contributor Author

geofft commented Apr 20, 2017

Oh, I didn't realize #[cfg] works on match arms. OK, then that seems like a reasonable path forward.

Is this good for merge or do you want to do more review / want more sample code / something? I could add a #[cfg(target_os="linux")] test case that handles PtraceEvent and PtraceSyscall.

Also @chaosagent does this work for your use case in #550?

@Susurrus
Copy link
Contributor

Re: merging, this is quite close. I'd request adding a CHANGELOG entry, providing doc comments for stop_additional and syscall_stop, and adding a test case would be definitely welcome! We want to have all functions get docs so I'm going to be enforcing that policy for new functions. While you didn't change stop_signal, I'd also appreciate if you added comments to that one if it's not too difficult!

Otherwise #590 should land soon and I'd like to wait for that to clear if it happens in the next few days, otherwise we'll push this through once you resolve the above.

@Susurrus
Copy link
Contributor

Susurrus commented Jun 2, 2017

@geofft @chaosagent I think your feedback is required here to finish this off. Can you guys chime in here?

@acidghost
Copy link

@geofft @kamalmarhubi @Susurrus what's the status of this PR? I'm building a strace-like module and having this PR merged would let me proceed.. Otherwise, is there any workaround?

@Susurrus
Copy link
Contributor

Susurrus commented Jun 18, 2017

@geofft At least a few things need to be addressed:

  • A CHANGELOG entry needs to be added
  • Add a doccomment for the new public function you added (assuming it's supposed to be public, but even if it's not.
  • And a comment that explains why you're bitmasking in the code. I like to have more documentation in the code when there are things like this.

Hopefully updating the PR will get buildbot working again.

@Susurrus
Copy link
Contributor

@geofft Are you still interested in working to get this merged?

@geofft
Copy link
Contributor Author

geofft commented Jun 20, 2017

Yeah, I'm doing that now! :) I'm writing docs for all of sys/wait.rs 'cause it needs it.

@Susurrus
Copy link
Contributor

@geofft 👍

@geofft geofft force-pushed the tracesysgood branch 4 times, most recently from d2318f8 to d0fa32f Compare June 20, 2017 05:56
@geofft
Copy link
Contributor Author

geofft commented Jun 20, 2017

Rebased, and added a CHANGELOG entry, a test case that exercises both WaitStatus::PtraceSyscall and WaitStatus::PtraceEvent, and docs on the WaitStatus enum as a separate commit.

The new public function isn't actually crate-public. It's within a non-public module, so it's only accessible within nix::sys::wait itself. So I did not document it. If you want docs on these functions anyway I could do that, but I think the docs on WaitStatus are basically the same content and more useful.

Given that we're adding new wait statuses, I think "don't exhaustively match on the enum" is the right documentation for now, but I suppose we could dig a bit to see if any half-popular UNIXes have any other wait statuses, since in theory nix 1.0 is going to want to stabilize exactly what the enum variants are.

Also, the nifty thing about adding tests is that we notice when things break:

  • The third commit in this series fixes ptrace_setoptions(), which any code using either PtraceEvent or PtraceSyscall relies upon. See Added ptrace utilities. #614; I think this is the right short-term fix but there's probably a better long-term fix.
  • From the Travis log, it looks like qemu-user only implements ptrace on x86 and x86_64. I added a #[cfg] + // FIXME following Skip failing tests for Linux/MIPS/PowerPC #590, is this the right way to handle this?

@@ -79,7 +79,7 @@ pub fn ptrace(request: ptrace::PtraceRequest, pid: pid_t, addr: *mut c_void, dat

match request {
PTRACE_PEEKTEXT | PTRACE_PEEKDATA | PTRACE_PEEKUSER => ptrace_peek(request, pid, addr, data),
PTRACE_GETSIGINFO | PTRACE_GETEVENTMSG | PTRACE_SETSIGINFO | PTRACE_SETOPTIONS => Err(Error::UnsupportedOperation),
PTRACE_GETSIGINFO | PTRACE_GETEVENTMSG | PTRACE_SETSIGINFO => Err(Error::UnsupportedOperation),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong solution here, instead ptrace_setoptions should be fixed to not use ptrace and instead ffi::ptrace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other functions here also need to be fixed, would you be willing to have a commit before your code that fixes all of the specialized ptrace_*() functions to call ffi::ptrace correctly and then have a commit that adds your changes? If you're feeling very benevolent, having tests for these other ptrace commands as part of that same commit so that we know all of it works would be even more awesome! I'm not too familiar with the ptrace API which is why I'm asking.


fn ptrace_child() -> Result<()> {
try!(ptrace(PTRACE_TRACEME, 0, ptr::null_mut(), ptr::null_mut()));
// As recommended by ptrace(2), raise SIGTRAP to pause the child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments need to be properly indented.

Signal::from_c_int((status & 0x7F00) >> 8).unwrap()
}

pub fn syscall_stop(status: i32) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs doccomments for the function.

@@ -81,7 +122,17 @@ mod status {
}

pub fn stop_signal(status: i32) -> Signal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs doc comments as well.

@Susurrus
Copy link
Contributor

This also needs a rebase. We've had quite a few things land and also support more platforms so I want to make sure we test those with this as well.

vincenthage added a commit to vincenthage/nix that referenced this pull request Jul 18, 2017
@Susurrus
Copy link
Contributor

@geofft Would you have time here soon to rebase this so it can be merged? I'd like to get this in the 0.9 release, which we'd like to do in the next week or so.

@geofft
Copy link
Contributor Author

geofft commented Jul 19, 2017

@Susurrus rebased. (Dropped the ptrace unbreak fix because #686 landed, yay.)

stop_signal and syscall_stop and friends aren't public (they're in a private module) so they don't need docs.

src/sys/wait.rs Outdated
@@ -40,14 +40,54 @@ libc_bitflags!(
target_os = "android"))]
const WSTOPPED: WaitPidFlag = WUNTRACED;

/// Possible return values from `wait()` or `waitpid()`, indicating a state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long of a first sentence. Please trim it down and put a empty comment line between it and the more expansive description text. If you run cargo doc --no-deps --open you'll see why this is necessary. I believe the same applies to the enum variants below.

CHANGELOG.md Outdated
@@ -24,6 +24,9 @@ This project adheres to [Semantic Versioning](http://semver.org/).
and nix::Error::UnsupportedOperation}`
([#614](https://github.com/nix-rust/nix/pull/614))
- Added `cfmakeraw`, `cfsetspeed`, and `tcgetsid`. ([#527](https://github.com/nix-rust/nix/pull/527))
- Added a `PtraceSyscall` variant to `nix::sys::wait::WaitStatus`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more clear if it stated something like "wait/waitpid now support PTRACE_SYSCALL through the new WaitStatus::PtraceSyscall(pid) type." Your current comment explains what changed in the code, but the CHANGELOG should be geared towards what changed in user-facing functionality, which my above suggestion does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to make it clear for consumers who don't care about PTRACE_O_TRACESYSGOOD and just want to know why a new enum variant showed up, but yeah, this could use some rephrasing. Try the new version on for size?

@Susurrus
Copy link
Contributor

I'm of the opinion that every function needs docs especially the internal functions where it's not always clear when or if they can be reused elsewhere in internal code. I'd appreciate docs but I won't hold up this PR based on them.

Have you run this PR through rustdoc and looked at how the generated docs look? I haven't seen the link syntax you're using before in doccomments so wanted to make sure you looked that over.

@geofft
Copy link
Contributor Author

geofft commented Jul 19, 2017

I'm of the opinion that every function needs docs especially the internal functions where it's not always clear when or if they can be reused elsewhere in internal code. I'd appreciate docs but I won't hold up this PR based on them.

There's not a ton to say - they're equivalents of the C macros (as mentioned in the WaitStatus docs). I could just mention the corresponding C macros. These functions basically only exist so they can be #[cfg]'d differently on platforms, otherwise they'd be inline in decode(). They definitely shouldn't be used anywhere outside of decode(), and if I understand the module visibility system right, the functions are private to nix::sys::wait and can't be used elsewhere in nix even if you wanted to.

Or, put another way, I'd rather spend time adding docs to wait and waitpid :)

Have you run this PR through rustdoc and looked at how the generated docs look? I haven't seen the link syntax you're using before in doccomments so wanted to make sure you looked that over.

Docs for both of my PRs are temporarily at https://kerberos.club/doc/nix/ (this is a symlink to my git checkout so it'll change in a few hours, but afk for a little bit). Seems to work, and I mostly based it on what rustc itself does.

CHANGELOG.md Outdated
@@ -24,6 +24,10 @@ This project adheres to [Semantic Versioning](http://semver.org/).
and nix::Error::UnsupportedOperation}`
([#614](https://github.com/nix-rust/nix/pull/614))
- Added `cfmakeraw`, `cfsetspeed`, and `tcgetsid`. ([#527](https://github.com/nix-rust/nix/pull/527))
- On Linux, added support for receiving `PTRACE_O_TRACESYSGOOD`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's condense this a bit and add Android: "On Linux & Android added support for receiving PTRACE_O_TRACESYSGOOD events from wait/waitpid using WaitStatus::PtraceSyscall"

The absolute path isn't necessary as the function signature makes it clear what enum we're referring to as well.

@Susurrus
Copy link
Contributor

Or, put another way, I'd rather spend time adding docs to wait and waitpid :)

Alright, then I'll expect more docs in this area of nix! :-)

Docs for both of my PRs are temporarily at https://kerberos.club/doc/nix/ (this is a symlink to my git checkout so it'll change in a few hours, but afk for a little bit). Seems to work, and I mostly based it on what rustc itself does.

Thanks for hosting that, but you didn't need to do it on my account. I just clone your branch and generate docs locally (I do this for all PRs). I was merely asking if you've done it so in theory everything's good, but I was still planning on checking it anyways.

Let's do one more change to the CHANGELOG entry. Then I'd like @asomers to sign off on the test (it looks fine to me), then I think we can merge this!

use std::ptr;

fn ptrace_child() -> Result<()> {
try!(ptrace(PTRACE_TRACEME, Pid::from_raw(0), ptr::null_mut(), ptr::null_mut()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't call try! in the child, because it returns into the test harness where it can deadlock. Instead, you should sys::process::exit(1) on failure. For the same reason, you should change the return type to ()

// As recommended by ptrace(2), raise SIGTRAP to pause the child
// until the parent is ready to continue
try!(raise(SIGTRAP));
unsafe {_exit(0)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to skip all the C-level cleanup stuff with _exit(0). Instead, you can simply use std::process::exit(0) and eliminate the unsafe block.

// Wait for the raised SIGTRAP
assert_eq!(waitpid(child, None), Ok(WaitStatus::Stopped(child, SIGTRAP)));
// We want to test a syscall stop and a PTRACE_EVENT stop
try!(ptrace_setoptions(child, PTRACE_O_TRACESYSGOOD | PTRACE_O_TRACEEXIT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing to mix try! and assert_eq! in the same function. Why not just assert!(ptrace(...).is_ok())?

@geofft geofft force-pushed the tracesysgood branch 2 times, most recently from f8ace90 to 74527c7 Compare July 21, 2017 06:26
The recommended way to trace syscalls with ptrace is to set the
PTRACE_O_TRACESYSGOOD option, to distinguish syscall stops from
receiving an actual SIGTRAP. In C, this would cause WSTOPSIG to return
SIGTRAP | 0x80, but nix wants to parse that as an actual signal.

Add another wait status type for syscall stops (in the language of the
ptrace(2) manpage, "PTRACE_EVENT stops" and "Syscall-stops" are
different things), and mask out bit 0x80 from signals before trying to
parse it.

Closes nix-rust#550
@geofft
Copy link
Contributor Author

geofft commented Jul 21, 2017

Fixed CHANGELOG and resolved @asomers' comments on the test (thanks!). I also needed to change it from #[cfg_attr(not(...,) ignore)] to #[cfg(...)] because otherwise the test failure was poisoning the fork mutex for other tests. :( Should be good now hopefully!

@asomers
Copy link
Member

asomers commented Jul 21, 2017

I think that assuages all of our concerns. Thanks so much for finishing this. This was the last PR blocking the next release.

bors r+

bors bot added a commit that referenced this pull request Jul 21, 2017
566: Add WaitStatus::PtraceSyscall for use with PTRACE_O_TRACESYSGOOD r=asomers

The recommended way to trace syscalls with ptrace is to set the
PTRACE_O_TRACESYSGOOD option, to distinguish syscall stops from
receiving an actual SIGTRAP. In C, this would cause WSTOPSIG to return
SIGTRAP | 0x80, but nix wants to parse that as an actual signal.

Add another wait status type for syscall stops (in the language of the
ptrace(2) manpage, "PTRACE_EVENT stops" and "Syscall-stops" are
different things), and mask out bit 0x80 from signals before trying to
parse it.

Closes #550
@bors
Copy link
Contributor

bors bot commented Jul 21, 2017

@bors bors bot merged commit d292984 into nix-rust:master Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants