-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export gossip with macro #73
base: prometheus
Are you sure you want to change the base?
Conversation
When referring to our node, we use simply `node` as opposed to `cluster`.
Instead of propagating `rpc_processor`, use the `block_commitment_cache` structure to get different bank confirmation levels
As ruuda pointed out, there is a deadlock if we acquire a read lock for both `block_commitment_cache` and `bank_forks`, in ruuda words: - We take the read lock on block_commitment_cache. - Some other thread takes a write lock on bank_forks. - We want to take the read lock on bank_forks, but this blocks because the other thread holds a write lock. - The other thread wants to take a write lock on block_commitment_cache, but that blocks because we hold the read lock. - Deadlock! Instead, we guard the `block_commitment_cache` read lock so we don't hold two read locks concurrently. This might lead, as ruuda also pointed out, to an inconsistency that's known to Solana and dealt already in the code the same way it's dealt in `rpc/src/rpc.rs`.
Should be "counter", not "count"
Also update Cargo.lock.
Down the rabbit hole from the arguments, propagating it to the prometheus part, we add a flag so we can track multiple vote accounts.
Modify how metrics are written to account for when there's nothing to report. Add metrics about votes: last voted and vote balance
Change variable and function names, and add information about vote credits
Slight change on how metrics are recorded
Co-authored-by: Ruud van Asseldonk <[email protected]>
Co-authored-by: Ruud van Asseldonk <[email protected]>
Co-authored-by: Ruud van Asseldonk <[email protected]>
With the epoch start slots and the number of slots in the epoch (and the current slot, which we already had), we can infer/estimate: * Epoch progress percentage * Slots left until the next epoch * Time left until the next epoch (from slot height increase) These are useful metrics to have about the network.
Derive a macro for `GossipStats`, a less invasive and more concise way of writing all metrics from it. Currently, the macro only accepts `Counter` type and will fail if used otherwise, but we can probably expand it in the future to work with more data types.
This function clears the gossip stats.
I'm not so sure about this PR, I like the procedural macro business and all but I don't know if we'd make sense of these gossip data, we'd need to also stop these metrics from clearing out, so I just commented the code that does it. Edit: Some metrics do look interesting, like |
Since we commented the previous line, doesn't make sense to update this variable.
Yep, it doesn’t really make sense to monitor something when we have no clue what it means. (Although sometimes if there is an outage and you see one of the metrics behaving in an unusual way around that time, you can learn something. But I prefer the thoughtful approach over the dump-all-the-data-approach.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea to do the macro. I have some suggestions.
submit_gossip_stats(&self.stats, &self.gossip, &stakes); | ||
*last_print = Instant::now(); | ||
// submit_gossip_stats(&self.stats, &self.gossip, &stakes); | ||
// *last_print = Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ”submit” actually resets the counters?
I can think of two ways to make this work upstream.
- Make logging metrics and Prometheus metrics mutually exclusive, and put the choice in a CLI flag. Then if Prometheus metrics are enabled, we would skip the log + reset here.
- If
GossipStats
holds only counters, we could implementstd::ops::Sub
for it, and keep two instances of it: the current stats, and the last logged stats. Then when it is time to log, we logcurrent - last_logged
, and then setlast_logged = current
. That way the metrics remain increasing for Prometheus, and the log output remains unchanged.
I think option 2 is kind of neat. It’s a bit more tedious in the GossipStats
, but it may be simpler in the end than threading a CLI flag through everything to this point. And it’s nicer to not have to choose.
out, | ||
&solana_prometheus_utils::MetricFamily { | ||
name: &format!("solana_gossip_{}", stringify!(#idents)), | ||
help: "Auto generated with Prometheus macro", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get the doc comment from the field and put it here? Doc comments are secretly attributes, but I don’t know if syn
parses them that way.
Or maybe alternatively, we should require a #[metric(help = "blah", type = "counter")]
attribute on the fields that we want to export. It’s a more invasive change, but also more explicit about what it does.
@@ -0,0 +1,45 @@ | |||
extern crate proc_macro2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea to make a macro for this! It’s a nice approach on the definition side, and the macro itself looks pretty simple too, nice.
4c3ed65
to
fd2769a
Compare
fff5538
to
50104e4
Compare
…abs#86) windows: Use vcpkg for openssl dep (#73) (cherry picked from commit b78c070) Co-authored-by: Jon C <[email protected]>
…abs#86) windows: Use vcpkg for openssl dep (#73) (cherry picked from commit b78c070) Co-authored-by: Jon C <[email protected]>
Derive
ExportPrometheus
intoGossipStats
, it creates a function calledwrite_prometheus(.)
that writes all fields from the struct that derives it.Currently, we only support
Counter
metrics used inGossipStats
, but we probably could extend it in the future.Also,
submit_gossip_stats
clears the gossip stats, that's not very good for our metrics, so we could just accumulate it instead