Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually cancel running jobs on worker shutdown to prevent blocking shutdown #11120

Merged
merged 2 commits into from
Aug 18, 2023

Conversation

rtibbles
Copy link
Member

@rtibbles rtibbles commented Aug 17, 2023

Summary

  • The worker code was only calling cancel on the stored future, but not interacting with the actual task cancellation mechanisms that are required to interrupt threads
  • Marks any running jobs as canceling, and lets the existing cancellation handling logic finish up
  • In the course of testing this, I ran into a couple of edge cases in our zeroconf handling where things would get cleaned up before certain code executed, so I added some defensive programming checks to handle these without errors

References

Fixes #10249

Reviewer guidance

Start then immediately stop the devserver - observe that it actually finishes up fairly quickly now rather than stalling for a while.

For a more acid test, run the server, start a long running content import task and then shut it down - it should still shut down relatively quickly.


Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@rtibbles rtibbles added the TODO: needs review Waiting for review label Aug 17, 2023
@rtibbles rtibbles requested review from bjester and vkWeb August 17, 2023 19:50
@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: small labels Aug 17, 2023
@dbnicholson
Copy link
Contributor

I gave this a spin locally and it worked nicely:

DEBUG    2023-08-18 10:18:26,748 JOBCHECKER shut down event received; closing.
INFO     2023-08-18 10:18:26,752 Canceling job id 88bbb5b183b342f7a35728428c3d75e2.
DEBUG    2023-08-18 10:18:26,820 https://studio.learningequality.org:443 "HEAD /content/storage/a/6/a60a1eec38d8a5ca61de29074662a879.png HTTP/1.1" 200 0
DEBUG    2023-08-18 10:18:26,910 https://studio.learningequality.org:443 "GET /content/storage/a/6/a60a1eec38d8a5ca61de29074662a879.png HTTP/1.1" 206 113392
INFO     2023-08-18 10:18:26,930 Canceling import: https://studio.learningequality.org/content/storage/a/6/a60a1eec38d8a5ca61de29074662a879.png
INFO     2023-08-18 10:18:26,933 Canceling import: https://studio.learningequality.org/content/storage/a/6/a60a1eec38d8a5ca61de29074662a879.png
DEBUG    2023-08-18 10:18:27,096 https://studio.learningequality.org:443 "HEAD /content/storage/d/7/d705f01ec547a4cc209f0dd167c2c20a.png HTTP/1.1" 200 0
DEBUG    2023-08-18 10:18:27,177 https://studio.learningequality.org:443 "GET /content/storage/d/7/d705f01ec547a4cc209f0dd167c2c20a.png HTTP/1.1" 206 100840
INFO     2023-08-18 10:18:27,446 Canceling import: https://studio.learningequality.org/content/storage/d/7/d705f01ec547a4cc209f0dd167c2c20a.png
INFO     2023-08-18 10:18:27,447 Canceling import: https://studio.learningequality.org/content/storage/d/7/d705f01ec547a4cc209f0dd167c2c20a.png
INFO     2023-08-18 10:18:27,488 Setting availability to True of 17 LocalFile objects based on passed in checksums
INFO     2023-08-18 10:18:27,509 Setting availability of non-topic ContentNode objects based on LocalFile availability in 1 batches of 10000
INFO     2023-08-18 10:18:27,517 Annotating ContentNode objects with children for 2 levels
INFO     2023-08-18 10:18:27,520 Annotating ContentNode objects with children for level 2
INFO     2023-08-18 10:18:27,524 Annotating ContentNode objects with children for level 1
DEBUG    2023-08-18 10:18:27,526 Recursive topic tree annotation took 0 seconds
INFO     2023-08-18 10:18:27,555 Joining 'ProcessControlPlugin'
INFO     2023-08-18 10:18:28,325 Stopped thread 'ProcessControlPlugin'.
INFO     2023-08-18 10:18:28,326 Bus state: IDLE
INFO     2023-08-18 10:18:28,327 Bus state: EXIT
INFO     2023-08-18 10:18:28,337 Waiting for child threads to terminate...
INFO     2023-08-18 10:18:28,338 Bus state: EXITED

Copy link
Member

@vkWeb vkWeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code changes were very much needed! 💯

Checked manually, working alright, green tick from my side ✔️

@bjester
Copy link
Member

bjester commented Aug 18, 2023

Thanks @vkWeb! I concur

@bjester bjester merged commit 5342834 into learningequality:release-v0.16.x Aug 18, 2023
@rtibbles rtibbles deleted the stop_drop_and_roll branch August 18, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem... SIZE: small TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kolibri shutdown blocked by running cancellable task
4 participants