-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No new processing tasks after redeploy/restart #392
Comments
Well that seems bad... Is this a new data processing server running Ubuntu 24.04, the 23.10 stopgap we had for a few months, or is it the older 18.04 one? (if its 18.04 definitely deploy new one using the launch script.) I'll take a look but I don't think we've had a complete system collapse like that (Are supervisord and celery processes/subprocesses running? mightneed to run the |
Hi Eli Thanks for getting back to me. In December I noticed that the worker OS was out of support, and I had some extra time so I went ahead and updated it to 20.04.6 LTS. This process was not as painless as what I would have hoped for due to some of the required libs not playing ball, but I managed to get the service up and running again. I left the data-manager on 18.04.6 LTS due to the issue above.
Ahh yes the launch script 💡, maybe it would be more straightforward to just bring up a new worker from scratch. Would it be safe to also spin up a new data processing manager ? That is first remove the existing manager then add via: |
(I'll respond to your actual issue in the next response, bear with.) Oh, apologies for that struggle. I guess it's not as clear as it should be - that's an important piece of feedback - part of the intent of the intent of the platform is to lower the barrier of entry with respect to the the system administration load, to make it not as intense for people who are not the world's expert in Beiwe. I'm going to make a note that this needs to be surfaced better. I do periodic updates to the launch script when there are platform level upgrades, and any time there are high level changes like an updated This March/April I did a bunch of work updating dependencies and the Ubuntu platform, there was a transient 23.10 version while we waited on 24.04, which should now be the version that is deployed on the main branch. The intended pattern-of-work for you is:
I also build launch script commands that apply to major technical and migration hassles, like the Python 3.6->3.8 Elastic Beanstalk platform update has a I'm going to make a new tag for relevant issues, "Infrastructure" (will tend to be paired with the "ANNOUNCEMENT" tag) for system admin level questions. Maybe that will help some. I also try to pin big items. It seems we need to find some more ways to make this clear and encourage posts about those tasks to improve the SEO. |
Yes. Follow the pattern I described in above, if you run into any immediate problems you are welcome to email me directly - username at gmail - and post an issue. Deployment problems will get very high priority, if something is absolutely unfathomable I am allowed some time for direct debugging outside of platform development. |
I also JUST merged the new Heartbeat feature into main, you can see a description of it on this announcement post: (so make sure to pull) |
Hi Eli Thank you for the details, this insight is valuable. With that it would seem that our deployment process has not been the best. I followed your instructions above and removed the old manager and worker ( Things seem to be working as expected 🎉. Thank you again for your efforts. |
Closing this issue, thank you for the help. |
(reopening because I need to remeber to update documentation/readme) |
readme has been updated on a branch, re-closing this issue. |
Hi Eli/Team
I hope you have had a good weekend.
We are at a bit of an impasse at the moment, I came across an unexpected error in the processing of the celery push notification task. From the log file (
celery_push_send.log
) :Initially I thought this was just down to an old version of the code so I redeployed the latest version of
main
(without error). Our setup is that of the “scalable deployment”, 1 EB webserver, 1 data processing manager and 1 data processing server. With that I restarted rabbit (service rabbitmq-server restart
) and the data processing (processing-restart
) for good measure.But over the last few days we have not seen any processing (data and push notifications).
According to
rabbitmqctl status
the service is running:And on the data processing server the celery workers are up
ps -ef | grep celery
:From
tail -50 celery_push_send.log
:It would appear that it is connected to Rabbit. The
tail celery_processing.log
outputs a similar result. There have been no errors in Senty since deploying/restarting. Disk space on all the machines has not been exhausted.Have you perhaps seen something like this before? Are there any additional logs I can check?
I have limited Python service hosting experience so perhaps I am missing something trivial in this regard. Any assistance in getting our Beiwe service (and associated studies) back in shape will be much appreciated. Please let me know if I can provide any additional details.
Related doc: https://github.com/onnela-lab/beiwe-backend/wiki/Celery-troubleshooting
Thank you.
Cheers
Russell
The text was updated successfully, but these errors were encountered: