-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Principal=>subordinate coordination for logrotation (grafana-agent causing a lot of unreleased files and huge syslog) #153
Comments
@simskij do you have good ideas how to proceed here? |
@taurus-forever Do you think this would be solved by not getting all the files from |
@lucabello AFAIK, no. The grafana-agent charm / promtail binary (did?) read some log files from disk (to send them to Loki). PostgreSQL charm rotates logs, but didn't send any signals to COS to close the current descriptor and reopen the file (as the old one moved to archive folder). IMHO, we have two options:
|
I'd appreciate your input on the above @taurus-forever. |
Patroni should be using an extended version of Python's RotatingFileHandler. Python's docs indicate “rename and create”. We can only configure size and amount of files to keep. Postgresql is configured to keep a week's worth of per minute logs that are truncated each minute. This is behaviour configured by us, but it is spec behaviour so discussions would be necessary to change it. Both Patroni and Postgresql try to keep about a week's worth of per minute logs, so that should be about 10k files for each.
I don't know how the agent detects log changes, but for Patroni only the last few logs should be relevant, the deeper backlog was likely already synced and shouldn't be changing. For Postgresql things are trickier, since the files are the same ( |
Bug Description
Hi,
Please check the complete issue description in PostgreSQL repo:
canonical/postgresql-operator#524
TL;DR: PostgreSQL charm rotates logs but doesn't send any signals to subordinated grafana-agent causing a lot of unreleased files and huge syslog => downtime.
It is a cross-team ticket to build a solution here.
To Reproduce
See steps to reproduce in canonical/postgresql-operator#524
Environment
See Versions in canonical/postgresql-operator#524
Relevant log output
Additional context
Proposals:
Better ideas are welcome!
The text was updated successfully, but these errors were encountered: