-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Celery outage #2459
Comments
Initial thoughts: |
Also going to check: |
Tracked down the line that failed it is in the export query step, that is being handled by celery once. |
I am noting here that there is a 2G limit on disk space. If this gives us problems again, we might want to check and see if we can refactor our download code so it writes to S3 in a streaming manner. I have not investigated that yet |
Had a good meeting with Josh, Carlo, Prya and Rohan about celery. It seems like we are going to do a two pronged attack,
|
We are still experiencing issues with celery-worker and celery-beat
|
|
OK, We are making progress on this issue We are going to scale Celery horizontallyWe have a script to test a bunch of downloads and it is working on dev with celery scaled horizontally. (Thanks @jontours and @pkfec) Queries won't error because the source table is updated@vrajmohan and @ccostino adjusted the psql vars so that we can get rid of the queries that were getting canceled because the master table was updated Working on streaming files@vrajmohan is looking into streaming the writing of zipfiles so we don't use disk Updating celery@ccostino has made a PR to update the celery version and that is on dev. We still need to confirm this gets the nightly update back in order. |
Thanks, @LindsayYoung! Related issue here for other adjustments/improvements/fixes: #2553 |
We have seen celery choke with a "no space left on device" error but the general app stats look ok. This can break downloads so we need to figure out what is causing the error and how to deal with it.
The text was updated successfully, but these errors were encountered: