Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with issues when the disk is full #2328

Open
2 of 12 tasks
patrickelectric opened this issue Jan 22, 2024 · 11 comments
Open
2 of 12 tasks

Deal with issues when the disk is full #2328

patrickelectric opened this issue Jan 22, 2024 · 11 comments
Labels
enhancement New feature or request P1 - Important Priority High priority task

Comments

@patrickelectric
Copy link
Member

patrickelectric commented Jan 22, 2024

Check: #2327, #2323, #2326, #1015

The docker is able to start, but everything after that just results in unstable behavior.
Some points that you suggested are already available as issues, others are relevant to recover the system.

  • We should delete old logs when doing the rotation and noticing the the disk space is almost full.
  • We should stop logging if the disk space is almost full.
    • This conflicts with rotation configuration in loguru
  • We should clean up old dockers that are not being used.
  • We should clean up old docker artifacts that are not being used.
  • We should allow user to delete all unused docker images.
  • We should warn the user though cockpit that the companion computer is almost full in disk.
  • We should erase older tlog or bin files if the disk is almost full.
  • We should warn the user though BlueOS header that the disk is almost full and in critical state.
  • We may not allow the user to arm the vehicle if the disk is almost full.
  • We may do some of this steps automatically to try to recover the system once it starts.
  • We may need a page like filelight on BlueOS to help identify the root of such problems.
  • We should limit journald max size

Originally posted by @patrickelectric in #2325 (comment)

@joaoantoniocardoso
Copy link
Member

I know it might be trickier to manage the installation, but another valid strategy is to put /var in another partition.

@patrickelectric
Copy link
Member Author

#2359

@patrickelectric patrickelectric added enhancement New feature or request P1 - Important Priority High priority task labels Feb 19, 2024
@voorloopnul
Copy link
Contributor

I installed one extension [Nortek Nucleus], that grew the docker log to 18GB in about a week, maybe kraken could add a limit in the size of the docker logs:

https://docs.docker.com/config/containers/logging/configure/#configure-the-default-logging-driver

--log-opt max-size=100m

@joaoantoniocardoso
Copy link
Member

joaoantoniocardoso commented May 10, 2024

I've just freed ~12 gb here by doing:

sudo docker system prune -a # ~9 gb
sudo journalctl --vacuum-time=2d  # ~1 gb
sudo apt-get clean  # ~1 gb

@patrickelectric
Copy link
Member Author

@joaoantoniocardoso how we end up with 1GB of unnecessary stuff in our apt ?

@rafaellehmkuhl
Copy link
Member

rafaellehmkuhl commented May 10, 2024

I've just freed ~12 gb here by doing:

sudo docker system prune -a # ~9 gb
sudo journalctl --vacuum-time=2d  # ~1 gb
sudo apt-get clean  # ~1 gb

The docker prune is specially important, as there are A LOT of leftover overlays hanging there forever.
It would be good to do it automatically, or at least putting a button on BlueOS to do that.

@joaoantoniocardoso
Copy link
Member

@joaoantoniocardoso how we end up with 1GB of unnecessary stuff in our apt ?

Maybe I've installed many things on mine, but it'd be good to check how it is in a fresh install.

@goasChris
Copy link

goasChris commented Jul 5, 2024

We are discussing this subject a bit in our project, as we run robots 24/7, and they could be running for several days, maybe even weeks without restarting/power cycling.

I've not done very thorough digging, but I think it would be very nice to have some kind of parameter (maybe even user facing), that would permit you to set a target age for tlog files. If I set 7 days, then any tlog files older than 7 days would be auto purged (no sure how often that should run). Maybe a bit out of scope for this issue, but should help nonetheless.

I can make a separate issue if that is better.

EDIT:
Also, before this can even happen, we would need mavlinkrouter to somehow auto split/rotate files every 200mb or 12 hours maybe.

@patrickelectric
Copy link
Member Author

patrickelectric commented Jul 5, 2024

Hi @goasChris, thanks for your input. Indeed the tlogs are also important for us to track. Adding on it, the tlogs are already in the list. Let us know if you have anything that is not being tracked at the moment.

@patrickelectric
Copy link
Member Author

patrickelectric commented Jul 5, 2024

About tlogs: mavlink-router/mavlink-router#426

@JoaoMario109
Copy link
Collaborator

Some things that take a lot of space and can be removed:

  • Use docker system prune -a to delete unnecessary overlays in /var/lib/docker/overlay2. However, we need somehow to ensure that the factory image is tagged, because right now it is not and got deleted with other overlays
  • Limit or remove .tlog files as they can accumulate and take up significant space.
  • Remove all unused images in BlueOS version, except for the factory image and the current running image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P1 - Important Priority High priority task
Projects
None yet
Development

No branches or pull requests

6 participants