-
Notifications
You must be signed in to change notification settings - Fork 9
DevOps: Troubleshooting 502 Bad Gateway Nginx 1.21.1 log writing failed. No space left on device @ io_write ‐ home nginx app log production.log
-
The first response to such error is to
-
Check the monitoring tool, as of 8/14/2023 we use Dynatrace for monitoring our Production and Staging servers. This will give you an overview of what issues you might want to investigate or look further into mostly disk space or CPU issues.
-
Access the Jenkins UI http://internal-dm-devops-523064655.us-gov-west-1.elb.amazonaws.com/, click on the Jobs and access the server which is currently having a downtime.
-
Click on Build Now to deploy the last commit merged to the master branch, once the build has been implemented go to the console output and look through each log information. This will help you streamline what files or directories to be accessed on the server.
-
Disk space issues usually give bad gateway errors. If this is the case, you will notice a disk space error as you refine your logs that read
log writing failed. No space left on device @ io_write - /home/nginx/app/log/production.log
- Access the AWS console,
ssh
into the server experiencing the downtime, Sudo su ec2-user
- pass the command
df-h
this will give you an overview of which directories or mounted points have exhausted their disk space. if it is the situation of the /home directory, - pass the command
docker ps
to see running docker containers on the server - Copy the second container ID number with the image name
dm:vaec
Pass the commanddocker exec -it container id /bin/bash
you will be able to ssh into the container as an nginx user. -
cd
into /home/nginx/app/log/ andls
you will see the filesproduction.log web.stder.log and web.stdout.log
- Pass the command
du -h filename
to help identify the memory space taken up by each file. - The aim is to restore space to the /home directory and let the developers know you will be clearing the contents of those files.
- Once an agreement with the team has been reached you can then pass the following command
truncate -s 0 /home/nginx/app/log/production.log
to clear the contents of the log files. Replace production.log with other log files naming conventions respectively. - After this is done, access the Jenkins server and repeat the same
Build now
steps on the affected environment. - Once the
build now
job has been deployed successfully access the webpage of the environment on your browser. - The webpage should be back up.