-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand metrics for errors and warnings #218
Conversation
Discussed this out-of-band with @AgeManning and we agreed to pivot the design of how Discv5 metrics are to work. As such, this PR is going through a refactor. |
What was the outcome of keeping per-crate metrics in lighthouse instead? what's the future of this pr in light of that outcome? |
I believe the state should be like this: We keep this PR. We add metrics to discv5 (and to libp2p) that gives us a count of errors and their severity. I.e number of crits, errors, warns. The idea then would be that we can check our metrics dash to know if there are any underlying problems and we can then go to the log files to investigate further. This PR needs to simply count the error logs. |
A simple way of doing this, it to replace the warn!, error! and crit! macros with ones that increment a metric count. No further code would need to change. Perhaps @armaganyildirak may be interested in this. |
I expressed this via other channels but will leave this here for the record. I don't think this is the right approach to solve this issue. If we end up doing this it should be as a last resort since doing this is highly polluting to the code and very hard to maintain. I proposed a different approach: to build a Since I have not seen this implemented yet, I'll give it a go and report back. In case I find the proposed approach not doable or equally hard to maintain, we could continue with providing these metrics from discv5. However, this should be done with macros, per log level, and ignoring the message of the log to reduce the polluting and maintenance work. (@AgeManning 's last comment, basically) Both scenarios diverge enough from the current PR to still consider it reasonable to build on top of this one even in the second scenario. This being said, I'm closing this PR. Second scenario should start a new one if that's where we land |
alternative here: sigp/lighthouse#4979 it ended up being fairly simple |
Description
Closes #213.
Notes & open questions
This PR expands the metrics collected by the Discv5 service to include errors and warnings that occur throughout the operation of the service. This is achieved by the introduction of a new type,
ErrorMetrics
, which stores four elements:A few technical notes:
AtomicUsize
in a manner such asself.errors.read().values().map(|at| at.load(Ordering::Relaxed).sum()
. The additional memory overhead is considered negligible as is the added locking overheadmetrics
module. Assigning unique numeric identifiers was considered but would add complexity with little benefit (these will eventually be turned back into some string representation when these hit Grafana/Prometheus, anyway).Notes for reviewers:
Change checklist
Tests if relevant