Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: killed processes report different exit code in 2.1-preview2 #26010

Closed
rainersigwald opened this issue Apr 27, 2018 · 14 comments
Closed

macOS: killed processes report different exit code in 2.1-preview2 #26010

rainersigwald opened this issue Apr 27, 2018 · 14 comments
Assignees
Labels
area-System.Diagnostics.Process bug os-mac-os-x macOS aka OSX tenet-compatibility Incompatibility with previous versions or .NET Framework
Milestone

Comments

@rainersigwald
Copy link
Member

using System;
using System.Diagnostics;

namespace killed_process_return
{
    class Program
    {
        static void Main(string[] args)
        {
            var p = Process.Start("sleep", "10");

            p.Kill();

            p.WaitForExit();

            Console.WriteLine(p.ExitCode);
        }
    }
}
bash-3.2$ dotnet run -f netcoreapp2.0
137
bash-3.2$ dotnet run -f netcoreapp2.1
128

The 2.0 behavior matches my shells and some searching: it's SIGKILL (9) + exited-due-to-signal (128).

This was discovered when we updated MSBuild to use the preview2 runtime for testing (worked around with dotnet/msbuild@a476aa5). The repro there is fairly complicated, since we don't use Process.Kill(), instead manually killing an entire process tree.

In MSBuild, we sent SIGTERM and expect 143 instead of 137, but I assume it's the same underlying cause.

Mono does not exhibit this behavior on macOS High Sierra, so I don't think it's an OS change.

dotnet --info
bash-3.2$ dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.1.300-preview2-008530
 Commit:    822ae6d43a

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.13
 OS Platform: Darwin
 RID:         osx.10.13-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.1.300-preview2-008530/

Host (useful for support):
  Version: 2.1.0-preview2-26406-04
  Commit:  6833f3026b

.NET Core SDKs installed:
  1.1.0-preview1-005104 [/usr/local/share/dotnet/sdk]
  2.0.0-preview3-006845 [/usr/local/share/dotnet/sdk]
  2.0.0 [/usr/local/share/dotnet/sdk]
  2.1.3 [/usr/local/share/dotnet/sdk]
  2.1.4 [/usr/local/share/dotnet/sdk]
  2.1.300-preview2-008530 [/usr/local/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.0-preview2-final [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.0-preview2-final [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 1.0.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.1.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.0-preview2-26406-04 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download
@karelz
Copy link
Member

karelz commented Apr 27, 2018

Do you know if it is High Sierra only behavior, or all Mac OSX / Linux as well?
@wtgodbe @joperezr can you please take a look?

I think there were changes in the area from @tmds, but I am not 100% sure ...

@rainersigwald
Copy link
Member Author

It doesn't happen on Linux but I haven't tried on 10.12 or older Mac OS versions.

@tmds
Copy link
Member

tmds commented Apr 27, 2018

I think there were changes in the area from @tmds, but I am not 100% sure ...

The implementation changed from using the waitpid system call to the waitid system call.

The exit code is determined here:

https://github.com/dotnet/corefx/blob/a2142d9998a5fa27d27b3e3da2c3b527be359b00/src/Native/Unix/System.Native/pal_process.cpp#L453-L483

For a signal we should hit:

https://github.com/dotnet/corefx/blob/a2142d9998a5fa27d27b3e3da2c3b527be359b00/src/Native/Unix/System.Native/pal_process.cpp#L478

Which adds 128 to si_status.

si_status should contain the signal number (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html):

int si_status | If si_code is equal to CLD_EXITED, then si_status holds the exit value of the process; otherwise, it is equal to the signal that caused the process to change state. The exit value in si_status shall be equal to the full exit value (that is, the value passed to _exit(), _Exit(), or exit(), or returned from main()); it shall not be limited to the least significant eight bits of the value.
-- | --

@akoeplinger
Copy link
Member

I can confirm that the same thing happens on macOS Sierra (10.12) with dotnet 2.1.300-preview2-008530 too.

@tmds
Copy link
Member

tmds commented Apr 28, 2018

Can someone with a mac please run the equivalent of strace on this console app?

e.g. on Linux I get this:

/tmp/console$ strace -qq -f -e waitid dotnet bin/Release/netcoreapp2.1/console.dll
[pid  4967] +++ killed by SIGKILL +++
[pid  4959] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=4967, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0} ---
[pid  4966] waitid(P_ALL, -1, {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=4967, si_uid=1000, si_status=SIGKILL, si_utime=9148900439124850944, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
[pid  4966] waitid(P_PID, 4967, {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=4967, si_uid=1000, si_status=SIGKILL, si_utime=9148900439124850944, si_stime=140180686306816}, WNOHANG|WEXITED, NULL) = 0
[pid  4966] waitid(P_ALL, -1, 0x7f7ef13c0b70, WNOHANG|WEXITED|WNOWAIT, NULL) = -1 ECHILD (No child processes)
137

You can see si_status=SIGKILL show up here.

@akoeplinger
Copy link
Member

akoeplinger commented Apr 30, 2018

Ok, I digged into this:

First of all you need to allow dtrace since it's blocked by System Integrity Protection by default: http://blog.shalman.org/enabling-dtrace-on-macos-sierra/

Using this dtrace script we can see the waitid() values of a dotnet process started in another shell instance:

$ sudo dtrace -n '
syscall::waitid:entry
{
    self->arg0 = arg0;
    self->arg1 = arg1;
    self->arg2 = arg2;
}
syscall::waitid:return
{
    this->info = (siginfo_t*)copyin(self->arg2, sizeof(siginfo_t));

    printf("%s(%i): %s(0x%x%s, %i, {si_signo: 0x%X%s, si_code: 0x%X%s, si_pid: %i, si_status: 0x%X}\n",
            execname,
            pid,
            probefunc,
            self->arg0,
            self->arg0 == 1 ? " (P_PID)" : self->arg0 == 0 ? " (P_ALL)" : "",
            self->arg1,
            this->info->si_signo,
            this->info->si_signo == 20 ? " (SIGCHLD)" : "",
            this->info->si_code,
            this->info->si_code == 2 ? " (CLD_KILLED)" : "",
            this->info->si_pid,
            this->info->si_status);
}
'

Output:

dtrace: description '
syscall::waitid:entry
' matched 2 probes
CPU     ID                    FUNCTION:NAME
  2    503                    waitid:return dotnet(2386): waitid(0x0 (P_ALL), 4294967295, {si_signo: 0x14 (SIGCHLD), si_code: 0x2 (CLD_KILLED), si_pid: 2387, si_status: 0x0}

  2    503                    waitid:return dotnet(2386): waitid(0x1 (P_PID), 2387, {si_signo: 0x14 (SIGCHLD), si_code: 0x2 (CLD_KILLED), si_pid: 2387, si_status: 0x0}

  5    503                    waitid:return dotnet(2386): waitid(0x0 (P_ALL), 4294967295, {si_signo: 0x1067B900, si_code: 0x5004B600, si_pid: 32749, si_status: 0x0}

It looks like waitid() indeed doesn't return the signal number in si_status like it does on Linux. A simple C program seems to confirm this too.

@stephentoub
Copy link
Member

cc: @danmosemsft, @ViktorHofer

@tmds
Copy link
Member

tmds commented Apr 30, 2018

@akoeplinger thank you for investigating further! We can fix this regression by using waitpid for reaping (and continue to use waitid for finding children to reap). @stephentoub you can assign the issue to me if you want.

@stephentoub
Copy link
Member

Thanks, @tmds. Can you get to this quickly and keep the change to the minimum necessary to fix this (plus test(s))? I'd like to try to get this into 2.1.

tmds referenced this issue in tmds/corefx Apr 30, 2018
dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitpid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370
tmds referenced this issue in tmds/corefx Apr 30, 2018
dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370
tmds referenced this issue in tmds/corefx Apr 30, 2018
dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370
tmds referenced this issue in tmds/corefx Apr 30, 2018
dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370
@karelz
Copy link
Member

karelz commented Apr 30, 2018

@wtgodbe can you please work with @tmds to get it into 2.1 through the process? (let's fix it in master first of course)

@wtgodbe
Copy link
Member

wtgodbe commented Apr 30, 2018

Sure thing, I'll track dotnet/corefx#29407 & help get it ported once merged.

stephentoub referenced this issue in dotnet/corefx May 1, 2018
* Fix Process.ExitCode on mac for killed processes

#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert
@stephentoub stephentoub reopened this May 1, 2018
@wtgodbe
Copy link
Member

wtgodbe commented May 1, 2018

dotnet/corefx#29407 is in, I can put up a port to 2.1 once I get access to my machine (should be tomorrow).

joperezr referenced this issue in joperezr/corefx May 1, 2018
* Fix Process.ExitCode on mac for killed processes

dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert
@joperezr
Copy link
Member

joperezr commented May 1, 2018

I can take care of this now so that we don't need to wait till tomorrow. Will send the PR soon.

stephentoub referenced this issue in dotnet/corefx May 2, 2018
* Fix Process.ExitCode on mac for killed processes

#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert
@joperezr
Copy link
Member

joperezr commented May 2, 2018

This was fixed in release/2.1 branch via PR dotnet/corefx#29445

@joperezr joperezr closed this as completed May 2, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Process bug os-mac-os-x macOS aka OSX tenet-compatibility Incompatibility with previous versions or .NET Framework
Projects
None yet
Development

No branches or pull requests

8 participants