-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leader QoS service metrics #21708
Leader QoS service metrics #21708
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21708 +/- ##
=========================================
- Coverage 81.3% 81.3% -0.1%
=========================================
Files 518 518
Lines 145734 145734
=========================================
- Hits 118539 118509 -30
- Misses 27195 27225 +30 |
ac658f0
to
65a145d
Compare
core/src/qos_service.rs
Outdated
thread::sleep(Duration::from_millis(100)); | ||
} | ||
} | ||
} | ||
|
||
#[derive(Default)] | ||
struct QosServiceMetrics { | ||
last_report: AtomicInterval, | ||
// bankign_stage creates one qos_service instance per working threads, which is uniquely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bankign_stage -> banking_stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: qos_service -> QosService
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
core/src/qos_service.rs
Outdated
verified_txs_count: AtomicU64, | ||
|
||
// accumulated number of transactions been processed, includes those landed and those to be | ||
// retriued (due to AccountInUse, and other QoS related reasons) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: retriued -> retried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed
let running_flag = Arc::new(AtomicBool::new(true)); | ||
let metrics = Arc::new(QosServiceMetrics::default()); | ||
let metrics = Arc::new(QosServiceMetrics::new(id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that each banking thread will report a separate qos-service-stats
per slot? i.e. the metrics are not aggregated across all banking threads per slot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the usage of doing this is: gossip votes pinned on thread #0, TPU votes on thread #1, and transactions on the rest of threads. So on the influx query side, I can use and id>1
to separate transaction flow from gossip/tpu votes flow, vice versa. Can also use slot
in where
to zoom in to interested slots.
} | ||
|
||
pub fn report(&self, bank_slot: Slot) { | ||
if bank_slot != self.slot.load(Ordering::Relaxed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow as I understand it is
- Every banking thread sends a
Bank
over a channel at the end ofprocess_and_record_transactions()
via a call toreport_metrics()
solana/core/src/banking_stage.rs
Line 968 in 65a145d
qos_service.report_metrics(bank.clone()); - There exists a unique
QosService
thread for that particular banking thread that will process that bank from the channel and call this functionreport
Might be good to add a comment for this at the top of this service to summarize this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do
cd14638
to
7fc97ff
Compare
…p/tpu votes and transactions; - qos_service metrics is reported with bank slot; - replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread
7fc97ff
to
637658a
Compare
Pull request has been modified.
automerge label removed due to a CI failure |
This PR has been automatically locked since there has not been any activity in past 14 days after it was merged. |
Problem
It'd be better if TPU transaction counters are aggregated by slots, would make QoS investigation easier.
Summary of Changes
Fixes #