Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supercronic logs "error running command: waitid: no child processes" regulary since v0.2.31 #171

Closed
ceesgeene opened this issue Sep 17, 2024 · 3 comments · Fixed by #172
Closed

Comments

@ceesgeene
Copy link

ceesgeene commented Sep 17, 2024

Since https://github.com/aptible/supercronic/releases/tag/v0.2.31 we regulary see superconic logging errors with the following message: "error running command: waitid: no child processes"

In that release ramr/go-reaper was added to automatically reap zombie processes (see 8e8973c ).

For that library there already exists a similar issue: ramr/go-reaper#11

One of the suggestions (ramr/go-reaper#11 (reference)) is to use a more verbose chunk of code so that the reaper runs inside a different process: https://github.com/ramr/go-reaper?tab=readme-ov-file#into-the-woods . Not sure if this is applicable to supercronic

@UserNotFound
Copy link
Member

Hi @ceesgeene, can you clarify if there is any impact beyond logging the error?

Pinging @qianlongzt in case you're able to assist with a triage or fix.

@qianlongzt
Copy link
Contributor

The impact is just random (not 100%) waitid errors. The CronsFailCounter may incorrect

Perhaps the fix is to run the reaper first and then start supercronic as its child?

https://github.com/aptible/supercronic/blob/master/cron/cron.go#L256
https://github.com/aptible/supercronic/blob/master/cron/cron.go#L119

image

qianlongzt added a commit to qianlongzt/supercronic that referenced this issue Sep 20, 2024
@jonasgeiler
Copy link

I am having the same problem. Funnily enough it seems like using --init when running the container mitigates this problem? Or maybe I just never witnessed it when using --init. As @qianlongzt said, it is very random.

joshraker added a commit that referenced this issue Oct 10, 2024
* fix: random waitid error

fix #171

* fix(reap): forward signal

* refactor: modify reaper to get supercronic exitStatus

* fix(reaper): unify signal list & fix signal forward

* chore: replace ioutil to io

* fix(test): ci timeout

* opt-out with no-reap flag

Co-authored-by: Josh Raker <[email protected]>

* fix: typo on signal

* fix: args pass to supercronic

* fix(test): remove removed flag

* chore: remove misleading comment

---------

Co-authored-by: Josh Raker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants