-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate high celery-worker memory #4864
Comments
From Kibana logs, it appears that there are many recurring instances of worker running out of memory in past 90 days till 1 year. The out of memory errors in worker are not confined to a specific case or document. |
celery v5.0.1 comes with few breaking changes:
Tried upgrading celery, kombu, click, vine pkgs to most compatible versions in this feature branch: |
ScheduleB/EFile downloads could be a symptom of worker running out of memory. Changes are going live (deploy to Prod) during innovation release on 06/15. Will monitor the worker memory after SB/EFile change gets deployed to production. |
After considering our current cloud.gov org memory limit @lbeaufort @fec-jli advised to make few adjustments on |
WIP PR with celery package upgrades: #4895 |
Monitor celery worker memory in Production. No additional work needed. |
What we’re after
On 5/20/21, MUR 7284 didn't appear more than an hour after publishing.
Celery-worker instances throwing
Worker exited prematurely: signal 9 (SIGKILL)
errors increasingly in the past month, which seems to correlate with celery-worker memory creeping up from 800MB/1GB to >=1GB/1GB.We should either:
Example:
Kibana app health tracking example.
Related ticket(s)
(Include the tickets that either came before, after, or are happening in tandem with this new ticket)
Action item(s)
(These are the smaller tasks that should happen in order to complete this work)
Completion criteria
(What does the end state look like - as long as this task(s) is done, this work is complete)
References/resources/technical considerations
(Is there sample code or a screenshot you can include to highlight a particular issue? Here is where you reinforce why this work is important)
The text was updated successfully, but these errors were encountered: