Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sage is killed on startup via sage-cleaner #27769

Closed
sagetrac-andy mannequin opened this issue May 5, 2019 · 12 comments · Fixed by #35204
Closed

sage is killed on startup via sage-cleaner #27769

sagetrac-andy mannequin opened this issue May 5, 2019 · 12 comments · Fixed by #35204

Comments

@sagetrac-andy
Copy link
Mannequin

sagetrac-andy mannequin commented May 5, 2019

Under rare conditions, sage is killed at startup by sage-cleaner.

This requires:

  1. Files left over in ~/.sage/temp/HOSTNAME/PID
  2. A pid in ~/.sage/temp/HOSTNAME/PID/spawned_processes must exist, and be started at init time. Its process group id will be 0

When sage-cleaner tries to clean up, it will do a "kill 0", which will kill sage.

Files will only be left in ~/.sage/temp/HOSTNAME if sage is not shutdown properly. Most likely because of a reboot.

The attached patch to sage-cleaner prevents it from doing 'kill 0'

With the fix, the ~/.sage/temp/HOSTNAME/cleaner.log

Starting sage-cleaner with PID 16954
Checking PIDs [809, 31312]
Process 809 is no longer running, so we clean up
Killing 809's spawned jobs
--> Killing 'Singular' with PID 814 and parent PID 809
--> Process group of PID 814 is 0. Not killing process group
Deleting /home/andy/.sage/temp/dokodemo/809
Contents of ~/.sage/temp/dokodemo/809/spawned_processes: 
814 Singular
ps auxww|grep 814
root       814  0.0  0.0      0     0 ?        S<   Apr30   0:00 [loop11]

CC: @jdemeyer

Component: scripts

Keywords: sage-cleaner

Author: Andy Howell

Branch/Commit: u/andy/sage_is_killed_on_startup_via_sage_cleaner @ 8be49ef

Issue created by migration from https://trac.sagemath.org/ticket/27769

@sagetrac-andy sagetrac-andy mannequin added this to the sage-8.8 milestone May 5, 2019
@sagetrac-andy sagetrac-andy mannequin added c: scripts labels May 5, 2019
@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

Attachment: sage-cleaner.patch.gz

Patch to prevent sage-cleaner from doing a kill 0

@dimpase
Copy link
Member

dimpase commented May 5, 2019

comment:1

just to clarify the development guidelines - one posts the branch with fixes, not the branch where the error occured. So the fix in the attachment should go into the branch to be posted.

@jdemeyer
Copy link

jdemeyer commented May 5, 2019

comment:3

But how could the PID be 0? I feel like this patch is fixing the symptom rather than the root cause.

@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

comment:4

Replying to @jdemeyer:

But how could the PID be 0? I feel like this patch is fixing the symptom rather than the root cause.

Its not actually the PID, but the process group id PGID.

sage-cleaner is getting a pid out of a stale spawned_processes file, and retrieving the process group id for that. It just so happens that the pid 814 is owned by root, and its process group id is 0.

ps axo pid,pgrp,user,ppid,comm -q 814
  PID  PGRP USER      PPID COMMAND
  814     0 root         2 loop11

sage-cleaner is calling getpgid():

>>> import os
>>> print(os.getpgid(814))
0

This confluence of events is almost as rare as unicorns. My motivation in posting it was that it took a long time for me to figure out what was wrong. I wanted to save others the trouble.

One could argue that the root cause is the failure of sage-cleaner to remove everything under ~/.sage/temp/HOSTNAME/ at startup.

Would that be a better solution?

@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

Changed commit from d765ee2 to none

@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

Changed branch from develop to none

@sagetrac-andy sagetrac-andy mannequin self-assigned this May 5, 2019
@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

comment:7

Replying to @dimpase:

just to clarify the development guidelines - one posts the branch with fixes, not the branch where the error occured. So the fix in the attachment should go into the branch to be posted.

Sorry. Hopefully I have done it correctly now. After jdemeyer's comment, I changed my approach. The attachment is not needed. Checked in changes via git trac push.

Thanks,

Andy

@sagetrac-andy
Copy link
Mannequin Author

sagetrac-andy mannequin commented May 5, 2019

Commit: 8be49ef

@embray
Copy link
Contributor

embray commented Jun 14, 2019

comment:8

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

@embray
Copy link
Contributor

embray commented Jun 14, 2019

comment:9

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

@embray embray removed this from the sage-8.8 milestone Jun 14, 2019
@tornaria
Copy link
Contributor

I was just hit by this bug and I wouldn't call its effect "trivial". In fact, after a reboot sage-cleaner went on a killing spree. I'm almost tempted to start running sage in a jail because of this.

I expect the commit here would have prevented this.

vbraun pushed a commit that referenced this issue Mar 26, 2023
…cleaner won't kill random processes

    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes #1234" use "Introduce new method to
calculate 1+1"
-->
### 📚 Description

From #27769:

Under rare conditions, sage is killed at startup by sage-cleaner.

This requires:

    Files left over in ~/.sage/temp/HOSTNAME/PID
    A pid in ~/.sage/temp/HOSTNAME/PID/spawned_processes must exist, and
be started at init time. Its process group id will be 0

When sage-cleaner tries to clean up, it will do a "kill 0", which will
kill sage.

Files will only be left in ~/.sage/temp/HOSTNAME if sage is not shutdown
properly. Most likely because of a reboot.

Even if the situation doesn't result on a "kill 0",  sage-cleaner would
still go on a killing spree for random processes which is awful.

The current PR removes everything under ~/.sage/temp/HOSTNAME/ at sage-
cleaner startup so this doesn't happen.

Since sage-cleaner won't run more than once at a time, this seems safe.

Fixes #27769

<!-- Describe your changes here in detail -->
<!-- Why is this change required? What problem does it solve? -->
<!-- If it resolves an open issue, please link to the issue here. For
example "Closes #1337" -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->

- [x] I have made sure that the title is self-explanatory and the
description concisely explains the PR.
- [x] I have linked an issue or discussion.
    
URL: #35204
Reported by: Gonzalo Tornaría
Reviewer(s):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants