-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-16561: Do not export RDS/Postgres logs to CloudWatch for Probe instances #1020
Conversation
Uh, but how do we debug issues with probe-created instances if we lose logs? |
It's not possible to set the retention period from here unfortunately. If we export RDS logs to CW, the log groups never expire. To fix that, I have another PR where I reduce the retention period for RDS logs. But at that point I can't differentiate anymore if it's a probe log or a regular instance log. In my opinion, these logs (i.e. Postgres logs) are not that important for the probe service, as the probe doesn't do that much with the DB. So far we never needed them. If the probe has a DB related issue that can't be debugged with the fleetshard or Central logs, we can manually create a Central instance and the issue should reproduce there as well. |
@cdu @connorgorman @dashrews78 Do you think it would be useful to keep the Postgres logs of the instances created by the probe service in CloudWatch? My intention here is to remove them because I think they provide little use, but on the other hand they will create thousands of log groups in CloudWatch (i.e. the vast majority of the log groups will belong to such instances, instead of the real customer ones). For context, the probe service continually creates instances, checks that they are healthy, then deletes them. If there is a DB related issue detected by the probe, it should be easily reproducible by manually creating a new instance instead. |
IMO I agree they provide little value. They would simply muddy the water and potentially lead to people chasing ghosts. (such as the rabbit holes I went down because I didn't understand what the probe service was). Cloudwatch is hard enough to navigate without extra noise. |
Oh, I didn't look at the code and didn't realize this was about RDS logs only, since it was not mentioned in the PR title nor description. No issues with that. |
@vladbologa Tangentially related, but is it possible to tag the RDS instances as probe instances? I want to look into having those tags propagated to cloudwatch/promethes so we can ignore them when looking at customer averages |
Yeah my bad, sorry. Wrote the title & description in a bit of rush. |
Yes I could do that, any preference on the tag itself? |
Something like ACSInstanceType = regular | internal (or probe)? Yeah, maybe even just like |
I'll go with regular & test then, seems self-explanatory enough. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johannes94, vladbologa The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Inspired by #1018 this PR does not export RDS logs to CloudWatch for Centrals created by the Probe service.
These instances are creating a lot of useless log groups and basically spamming CloudWatch.
Checklist (Definition of Done)
Unit and integration tests addedAdded test description underTest manual
Documentation added if necessary (i.e. changes to dev setup, test execution, ...)ROX-12345: ...
Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.Add secret to app-interface Vault or Secrets Manager if necessaryTest manual
TODO: Add manual testing efforts