-
Notifications
You must be signed in to change notification settings - Fork 46
Celery troubleshooting
Celery is the task queue that Beiwe uses for sending push notifications, processing/batching data, and running Forest. On a scalable deployment, Celery runs on the "data processing" aka "worker" server.
This page is a work in progress. Please add to it if you come up with new helpful info.
- Read the most recent messages in
mail
with the commandnano /var/mail/ubuntu
and then jump to the end of the file (Ctrl + End). That should show any error messages output by Cron. - Read the most recent messages in
~/celery_processing.log
,~/celery_push_send.log
, and~/celery_forest.log
to see if any of them include an error.
Check if the server is out of disk space by running df
. If your main partition is close to 100% used, that's likely your problem. Here's how to increase your disk and partition size:
- In the AWS web console, find your EC2 instance, use that to find the Volume, and increase the size of the volume. AWS documentation here.
- When SSHed into the server, extend the partition and then extend the filesystem. AWS documentation is here.
- You'll likely run into the error
mkdir: cannot create directory '/tmp/growspart.xxxx': No space left on device
. The easiest way around that is to delete some unnecessary files; one way is to purge everything older than 30 days from the systemd journal file:sudo journalctl --vacuum-time=30days
.
- You'll likely run into the error
- Check that
supervisord
is still up, and restart it if not. It's probably sufficient to just runprocessing-start
and/orsudo processing-restart
.
If Celery isn't running, one symptom is that ArchivedEvent
s aren't being created.
To check if Celery is running, SSH into the worker server, and run htop
. If Celery is running, you should see something like this in the htop console:
|- /usr/bin/python /usr/bin/supervisord
| |- python3 -m celery -A services.celery_push_notification...
| |- python3 -m celery -A services.celery_data_processing w...
| | |- python3 -m celery -A services.celery_data_processin...
| | |- python3 -m celery -A services.celery_data_processin...
| |- python3 -m celery -A services.celery_forest worker -Q ...
| |- python3 -m celery -A services.celery_forest worker ...
| |- python3 -m celery -A services.celery_forest worker ...
If you don't see that, then Celery probably isn't running. One likely culprit is supervisord
not running. Start it by running processing-start
.
If you open the Beiwe Celery logs in the home directory (~/celery_processing.log
, ~/celery_push_send.log
, or ~/celery_forest.log
) and see this message, then RabbitMQ is probably down:
[ERROR/MainProcess] consumer: Cannot connect to amqp://beiwe:**@HOSTNAME:50000//: [Errno 111] Connection refused.
Trying again in 32.00 seconds... (16/100)
Try running sudo rabbitmqctl status
. If the status output tells you that the node isn't running, check the logs in /var/log/rabbitmq
. One procedure for restarting RabbitMQ is here. It's basically:
- Back up what you want from the RabbitMQ logs directory, and then delete the contents of the directory:
rm /var/log/rabbitmq/*
sudo service rabbitmq-server start
-
sudo rabbitmqctl start_app
If this gives you an error, then run: sudo service rabbitmq-server restart
- Finally, run
sudo rabbitmqctl status
again to confirm that it's running.
-
Does your participant have a current FCM token? (Check using the database shell)
-
Did an
ArchivedEvent
get created? When a push notification fails to send, it often creates anArchivedEvent
in the database with astatus
message that gives some information. -
If your events are failing with the error message
google.auth.exceptions.RefreshError: ('invalid_grant: Invalid JWT Signature.', '{"error":"invalid_grant","error_description":"Invalid JWT Signature."}')
, your backend Firebase credentials key is probably invalid. Go to the IAM & Admin console. (To get there from the Firebase Console, click the Gear Icon -> Project Settings -> Service Accounts -> n service accounts [under "All service accounts"]). Once you're in the IAM & Admin console, click "Service Accounts" and then "Manage Keys" on the relevant Service Account.