-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel mpsc_import_notification_stream #611
Comments
I took a closer look at the issue, and it is caused by this alerting rule: polkadot-sdk/substrate/scripts/ci/monitoring/alerting-rules/alerting-rules.yaml Lines 148 to 159 in 92d7751
Therefore, we need to update the alert as follows:
These changes should help address the issue. |
@dmitry-markin you worked on the |
@BulatSaif Thank you for the investigation! I see that I introduced a breaking change in paritytech/substrate#13504. Namely, I stopped counting dropped messages as received, and that broke alerts. There are three options to get it fixed:
I would vote for either option 2 or 3, because considering dropped messages to be received seems counterintuitive to me. Also, the alert is likely triggered only after the node is stopped, because the messages in this channel can be dropped only when the node is terminated. Taking this into account, I don't think calculating the channel size in prometheus alerts using Option 2 is a little bit simpler, because we don't need to update @altonen What do you think? |
I would go for option 3 |
It is better to avoid any metrics calculation on application level, it may cause unnecessary load. However, there is a catch when implementing the second option. Many of the Current Alert: If To address this, the new alert should be modified as follows: New Alert: Here's the final expression:
|
In this specific case channel size reporting is cheap. And it's more robust, because it might happen that the dropped messages are not reported on the node shut down (e.g., if the node shuts down fast enough for metrics to be not reported to prometheus). |
@BulatSaif Could you reconfigure the alerts to use a new metric |
I created PR with alert fix: #1568 |
paritytech#1568) # Description Follow up for paritytech#1489. Closes paritytech#611 Before we calculated the channel size during alert expression but in paritytech#1489 a new metric was introduced that reports channel size. ## Changes: 1. updated alert rule to use new metric.
I get those messages below on a regular basis. It's much less than before 0.9.42, but it's still there once in a while on most/all my nodes.
_Originally posted by @LukeWheeldon in #679
The text was updated successfully, but these errors were encountered: