-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Dogfood the monitoring we ship with Sourcegraph #5370
Comments
@slimsag I believe this issue has been replaced by others or is mostly done. Re-open if still applicable. |
Reworded to match the current state of affairs |
@slimsag is it possible to get some more context here, or an example of where this is currently configured and working? |
Dear all, This is your release captain speaking. 🚂🚂🚂 Branch cut for the 3.16 release is scheduled for tomorrow. Is this issue / PR going to make it in time? Please change the milestone accordingly. Thank you |
Historically, Sourcegraph.com has had alerting that was not shared by our customers or other Sourcegraph instances and was also very outdated/broken regularly because Sourcegraph.com was being neglected. Very recently, we created a new set of monitoring dashboards and alerting rules, which resulted in this generator: https://github.com/sourcegraph/sourcegraph/tree/master/monitoring We have unified the Prometheus and Grafana rules on Sourcegraph.com and this monitoring generator for the most part, but a few outliers remain: One outlier is Site24x7, but this is expected, see https://github.com/sourcegraph/sourcegraph/issues/10742 The major outlier (what this issue is about) is that on Sourcegraph.com we still use Prometheus alertmanager to report alerts from Prometheus to OpsGenie. This is in contrast to using Grafana to send alerts to OpsGenie, which is what we advise all our customers to do (see docs here: https://docs.sourcegraph.com/admin/observability/alerting ) What we need to do to solve this issue is:
If you want to get started on https://github.com/sourcegraph/sourcegraph/issues/10641 that would be great. This issue, however, has many prerequisites and dependencies on me as noted above so it was a mistake to assign this to this milestone. Kicking it back. |
Added a |
@slimsag is this required? it seems that these alerts are already included in |
Eventually these need to be removed, so all alerts/monitoring items are defined in https://github.com/sourcegraph/sourcegraph/tree/master/monitoring In other words, for us to be considered fully using the new monitoring stack I think this is necessary. But if we can use both today and do that migration slowly, I am happy with that (just break that portion out of this issue by filing a new one) |
Dear all, This is your release captain speaking. 🚂🚂🚂 Branch cut for the 3.18 release is scheduled for tomorrow. Is this issue / PR going to make it in time? Please change the milestone accordingly. Thank you |
The last real piece in this pie is subscribing opsgenie to critical alerts. I will coordinate a time with @slimsag sometime this week or next week where we will take on-call and enable this, so that one of us can handle any pages that happen in that time and determine if they should just be silenced instead. With that in mind, I'm adding this to 3.19 alongside https://github.com/sourcegraph/sourcegraph/issues/12160, which tracks the other remaining TODO here separately. cc @pecigonzalo |
This banner is currently a significant source of confusion, and since we're still working on dogfooding alerts in sourcegraph.com (#5370) it's hard to say we're confident about all our alerts enough to have it displayed so prominently (yet) Co-authored-by: ᴜɴᴋɴᴡᴏɴ <[email protected]>
Sourcegraph.com is currently running our old alerting stack (see here) which is very poor in multiple aspects.
The new / recently-added monitoring stack which ships with Sourcegraph is what we should switch to using (i.e. https://docs.sourcegraph.com/admin/observability/alerting )
For more details see https://github.com/sourcegraph/sourcegraph/issues/5370#issuecomment-629406540
The text was updated successfully, but these errors were encountered: