Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-16561: Do not export RDS/Postgres logs to CloudWatch for Probe instances #1020

Merged
merged 1 commit into from
May 11, 2023

Conversation

vladbologa
Copy link
Contributor

@vladbologa vladbologa commented May 10, 2023

Description

Inspired by #1018 this PR does not export RDS logs to CloudWatch for Centrals created by the Probe service.

These instances are creating a lot of useless log groups and basically spamming CloudWatch.

Checklist (Definition of Done)

  • Unit and integration tests added
  • Added test description under Test manual
  • Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
  • CI and all relevant tests are passing
  • Add the ticket number to the PR title if available, i.e. ROX-12345: ...
  • Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.
  • Add secret to app-interface Vault or Secrets Manager if necessary

Test manual

TODO: Add manual testing efforts

# To run tests locally run:
make db/teardown db/setup db/migrate
make ocm/setup OCM_OFFLINE_TOKEN=<ocm-offline-token> OCM_ENV=development
make verify lint binary test test/integration

@vladbologa vladbologa temporarily deployed to development May 10, 2023 10:13 — with GitHub Actions Inactive
@vladbologa vladbologa temporarily deployed to development May 10, 2023 10:13 — with GitHub Actions Inactive
@vladbologa vladbologa temporarily deployed to development May 10, 2023 10:13 — with GitHub Actions Inactive
@vladbologa vladbologa requested review from johannes94 and a team May 10, 2023 10:23
@porridge
Copy link
Collaborator

porridge commented May 10, 2023

Uh, but how do we debug issues with probe-created instances if we lose logs?
Could we maybe instead set a different retention period for these?

@vladbologa
Copy link
Contributor Author

vladbologa commented May 10, 2023

Uh, but how do we debug issues with probe-created instances if we lose logs? Could we maybe instead set a different retention period for these?

It's not possible to set the retention period from here unfortunately. If we export RDS logs to CW, the log groups never expire. To fix that, I have another PR where I reduce the retention period for RDS logs. But at that point I can't differentiate anymore if it's a probe log or a regular instance log.

In my opinion, these logs (i.e. Postgres logs) are not that important for the probe service, as the probe doesn't do that much with the DB. So far we never needed them. If the probe has a DB related issue that can't be debugged with the fleetshard or Central logs, we can manually create a Central instance and the issue should reproduce there as well.

@vladbologa
Copy link
Contributor Author

@cdu @connorgorman @dashrews78 Do you think it would be useful to keep the Postgres logs of the instances created by the probe service in CloudWatch?

My intention here is to remove them because I think they provide little use, but on the other hand they will create thousands of log groups in CloudWatch (i.e. the vast majority of the log groups will belong to such instances, instead of the real customer ones).

For context, the probe service continually creates instances, checks that they are healthy, then deletes them. If there is a DB related issue detected by the probe, it should be easily reproducible by manually creating a new instance instead.

@dashrews78
Copy link

@cdu @connorgorman @dashrews78 Do you think it would be useful to keep the Postgres logs of the instances created by the probe service in CloudWatch?

My intention here is to remove them because I think they provide little use, but on the other hand they will create thousands of log groups in CloudWatch (i.e. the vast majority of the log groups will belong to such instances, instead of the real customer ones).

For context, the probe service continually creates instances, checks that they are healthy, then deletes them. If there is a DB related issue detected by the probe, it should be easily reproducible by manually creating a new instance instead.

IMO I agree they provide little value. They would simply muddy the water and potentially lead to people chasing ghosts. (such as the rabbit holes I went down because I didn't understand what the probe service was). Cloudwatch is hard enough to navigate without extra noise.

@porridge
Copy link
Collaborator

Oh, I didn't look at the code and didn't realize this was about RDS logs only, since it was not mentioned in the PR title nor description. No issues with that.

@connorgorman
Copy link
Contributor

@vladbologa Tangentially related, but is it possible to tag the RDS instances as probe instances? I want to look into having those tags propagated to cloudwatch/promethes so we can ignore them when looking at customer averages

@vladbologa vladbologa changed the title ROX-16561: Do not export CloudWatch logs for Probe instances ROX-16561: Do not export RDS/Postgres logs to CloudWatch for Probe instances May 10, 2023
@vladbologa
Copy link
Contributor Author

Oh, I didn't look at the code and didn't realize this was about RDS logs only, since it was not mentioned in the PR title nor description. No issues with that.

Yeah my bad, sorry. Wrote the title & description in a bit of rush.

@vladbologa
Copy link
Contributor Author

@vladbologa Tangentially related, but is it possible to tag the RDS instances as probe instances? I want to look into having those tags propagated to cloudwatch/promethes so we can ignore them when looking at customer averages

Yes I could do that, any preference on the tag itself?
Something like ACSInstanceType = regular | internal (or probe)?

@connorgorman
Copy link
Contributor

connorgorman commented May 10, 2023

Something like ACSInstanceType = regular | internal (or probe)?

Yeah, maybe even just like test which is what the probe is doing? No strong feelings

@vladbologa
Copy link
Contributor Author

Something like ACSInstanceType = regular | internal (or probe)?

Yeah, maybe even just like test which is what the probe is doing? No strong feelings

I'll go with regular & test then, seems self-explanatory enough.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johannes94, vladbologa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [johannes94,vladbologa]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vladbologa vladbologa merged commit 9abece0 into main May 11, 2023
@vladbologa vladbologa deleted the vbologa/logs_export_probe branch May 11, 2023 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants