Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader QoS service metrics #21708

Merged
merged 2 commits into from
Dec 22, 2021

Conversation

tao-stones
Copy link
Contributor

Problem

It'd be better if TPU transaction counters are aggregated by slots, would make QoS investigation easier.

Summary of Changes

  • qos_service metrics tagged with leader thread ids to separate gossip/tpu votes and transactions;
  • qos_service metrics is reported with bank slot;
  • replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread
  • add tpu live packets (eg, not buffered packets) states to qos metrics reporting

Fixes #

@codecov
Copy link

codecov bot commented Dec 8, 2021

Codecov Report

Merging #21708 (637658a) into master (bf8fbf8) will decrease coverage by 0.0%.
The diff coverage is 90.2%.

@@            Coverage Diff            @@
##           master   #21708     +/-   ##
=========================================
- Coverage    81.3%    81.3%   -0.1%     
=========================================
  Files         518      518             
  Lines      145734   145734             
=========================================
- Hits       118539   118509     -30     
- Misses      27195    27225     +30     

@tao-stones tao-stones force-pushed the leader_vote_handling_metrics branch 2 times, most recently from ac658f0 to 65a145d Compare December 8, 2021 23:32
thread::sleep(Duration::from_millis(100));
}
}
}

#[derive(Default)]
struct QosServiceMetrics {
last_report: AtomicInterval,
// bankign_stage creates one qos_service instance per working threads, which is uniquely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bankign_stage -> banking_stage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: qos_service -> QosService

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

verified_txs_count: AtomicU64,

// accumulated number of transactions been processed, includes those landed and those to be
// retriued (due to AccountInUse, and other QoS related reasons)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: retriued -> retried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

let running_flag = Arc::new(AtomicBool::new(true));
let metrics = Arc::new(QosServiceMetrics::default());
let metrics = Arc::new(QosServiceMetrics::new(id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that each banking thread will report a separate qos-service-stats per slot? i.e. the metrics are not aggregated across all banking threads per slot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the usage of doing this is: gossip votes pinned on thread #0, TPU votes on thread #1, and transactions on the rest of threads. So on the influx query side, I can use and id>1 to separate transaction flow from gossip/tpu votes flow, vice versa. Can also use slot in where to zoom in to interested slots.

}

pub fn report(&self, bank_slot: Slot) {
if bank_slot != self.slot.load(Ordering::Relaxed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow as I understand it is

  1. Every banking thread sends a Bank over a channel at the end of process_and_record_transactions() via a call to report_metrics()
    qos_service.report_metrics(bank.clone());
  2. There exists a unique QosService thread for that particular banking thread that will process that bank from the channel and call this function report

Might be good to add a comment for this at the top of this service to summarize this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do

@tao-stones tao-stones force-pushed the leader_vote_handling_metrics branch 5 times, most recently from cd14638 to 7fc97ff Compare December 15, 2021 18:37
carllin
carllin previously approved these changes Dec 21, 2021
…p/tpu votes and transactions;

- qos_service metrics is reported with bank slot;
- replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread
@tao-stones tao-stones force-pushed the leader_vote_handling_metrics branch from 7fc97ff to 637658a Compare December 22, 2021 07:58
@mergify mergify bot dismissed carllin’s stale review December 22, 2021 07:58

Pull request has been modified.

@tao-stones tao-stones added the automerge Merge this Pull Request automatically once CI passes label Dec 22, 2021
@mergify
Copy link
Contributor

mergify bot commented Dec 22, 2021

automerge label removed due to a CI failure

@mergify mergify bot removed the automerge Merge this Pull Request automatically once CI passes label Dec 22, 2021
@tao-stones tao-stones added the automerge Merge this Pull Request automatically once CI passes label Dec 22, 2021
@mergify mergify bot merged commit dd80a52 into solana-labs:master Dec 22, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2022

This PR has been automatically locked since there has not been any activity in past 14 days after it was merged.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
automerge Merge this Pull Request automatically once CI passes locked PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants