Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupt Handling Fixes #167

Merged
merged 2 commits into from
Sep 21, 2022
Merged

Interrupt Handling Fixes #167

merged 2 commits into from
Sep 21, 2022

Conversation

ikreymer
Copy link
Member

Improvements to interrupt handling, for Browsertrix Cloud

  • Scope check performed on each URL before starting crawl, in case crawl scope has changed
  • SIGUSR1 waits for page to finish, marks crawler as done, to prevent restart (for full crawl stop)
  • SIGUSR2 waits for page to finish, exits immediately (for scale-down)

- SIGTERM/SIGINT behave same way, trigger an graceful shutdown after page load

improvements of remote state / parallel crawlers (for browsertrix-cloud):
- SIGUSR1 before SIGINT/SIGTERM ensures data is saved, mark crawler as doen - for use with graceful stopping crawl
- SIGUSR2 before SIGINT/SIGTERM ensures data is saved, does not mark crawler as done - for use with scaling down a single crawler
@ikreymer ikreymer merged commit 65933c6 into main Sep 21, 2022
@ikreymer ikreymer deleted the interrupt-fix branch September 21, 2022 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant