feat: measured oneshots #3902

drahnr · 2021-09-22T11:16:10Z

Introduces oneshots that measure the delay between creation, first poll and completion or error (distinguishing between hard timeout and cancellation).

Also prints a warning based on a set soft timeout.

Closes #3648

drahnr · 2021-09-22T12:46:51Z

The API usage is as follows:

When creating a pair, it has to be given a name as well as two timeouts.

// 1. create
let (tx, rx) = metered::oneshot::channel("foo", Duration::from_secs(500), Duration::from_secs(3));

// 2. store the meter
let meter = rx.meter();

// 3.
let val = rx.await;

reasoning for this rather providing the metrics as part of the error or as tuple, is compat. This is a rather minimal change and we could add metrics as needed, yet just change all use-statements at once and adjust to the additional params without much chore.

node/metered-channel/src/oneshot.rs

Co-authored-by: Andronik Ordian <[email protected]>

node/metered-channel/src/oneshot.rs

Co-authored-by: Andronik Ordian <[email protected]>

node/metered-channel/src/oneshot.rs

ordian · 2021-09-22T14:42:11Z

The impl looks correct, but a bit too complicated to me. Can we just have a thin abstraction on top of rx.await that will do the following?

let now = Instant::now();
let result = rx.timeout(HARD_TIMEOUT).await?;
match result {
    None => {
        error!("Subsystem {name} is unresponsive", name=name);
    }
    Some(msg) if now.elapsed() > SOFT_TIMEOUT {
        warn!("subsystem {name} is slow", name=name);
    }
<snip>    
}

drahnr · 2021-09-22T14:46:41Z

The impl looks correct, but a bit too complicated to me. Can we just have a thin abstraction on top of rx.await that will do the following?

let now = Instant::now();
let result = rx.timeout(HARD_TIMEOUT).await?;
match result {
    None => {
        error!("Subsystem {name} is unresponsive", name=name);
    }
    Some(msg) if now.elapsed() > SOFT_TIMEOUT {
        warn!("subsystem {name} is slow", name=name);
    }
<snip>    
}

Essentially that is what is implemented explicitly within poll, since I'd rather not have the above code spread over all cases of oneshot metering ( + the above is not measuring anything yet).

The complexity arises from Option<impl Future>, since we need to measure from the time we poll for the first time rather than creation. Without that, it'd become much simpler, but that'd be slightly less helpful I think.

Another, alternate simplification would be to spawn Delay with a very large timeout. And then once polled for the first time, call reset. But that seems mildly hacky.

I very much appreciate the feedback!

Note that I did not consider the complexity of the above code that much, I was more concerned how invasive introducing it throughout the code base would be.

ordian · 2021-09-22T15:41:08Z

The complexity arises from Option, since we need to measure from the time we poll for the first time rather than creation. Without that, it'd become much simpler, but that'd be slightly less helpful I think.

What I'm suggesting is implementing a function (or a struct wrapper for more type-safety) with the code above (modulo bikeshed on naming)

async fn timed_await(rx: oneshot::Receiver<T>) -> Result<T> {...}

and replacing rx.await? with timed_await(rx).await?. The time we poll the first time is the same as the time we call this function, so I don't see a difference here.

But maybe I'm missing something?

drahnr · 2021-09-23T08:30:12Z

The complexity arises from Option, since we need to measure from the time we poll for the first time rather than creation. Without that, it'd become much simpler, but that'd be slightly less helpful I think.

What I'm suggesting is implementing a function (or a struct wrapper for more type-safety) with the code above (modulo bikeshed on naming)
async fn timed_await(rx: oneshot::Receiver<T>) -> Result<T> {...}
and replacing rx.await? with timed_await(rx).await?. The time we poll the first time is the same as the time we call this function, so I don't see a difference here.

But maybe I'm missing something?

What you describe would be the functional approach and at the first glance you indeed obtain similar measurements while avoiding some complexity.

With the struct- wrappers you will always end up with the complexity exhibited in this PR - one has to somehow mingle the futures and provide a Future trait impl for the wrapper type.

The complexity on the other handside allows to better measure the actual time between the first poll and the send rather than the time of the receiver end blocking. In the common case these will be almost identical, but I'd rather not simplify here, especially when one looks at grafana dashboards occluding the odd deltas.

This also allows to simply add a bunch of extra measurements as needed (what's the delay between receiver creation and first poll, the delta between the true send and the true conclusion, so machine overload conditions will not show as wrongly blocking durations, but these are just a few details that will eventually become important in our grafana views).

I'd prefer to merge this with the tad of added complexity rather than measuring almost the right thing :) - in fact this impl allows to measure both as needed.

node/metered-channel/src/oneshot.rs

…neshots

rphmeier · 2021-09-29T12:57:39Z

node/metered-channel/src/oneshot.rs

+	)
+}
+
+#[allow(missing_docs)]


why is this allow(missing_docs)? the measurements function is also without docs.

This in particular has missing docs, because iff the error is not self describing, there is something wrong already. The error annotation should be enough. This is a pattern we have in more places throughout the codebase, limited to Error types.

node/metered-channel/src/oneshot.rs

rphmeier

The code here looks OK, but I wonder about expected usage - if it's meant to be plugged into prometheus, then it feels fairly unwieldy to do so.

drahnr · 2021-09-29T13:24:13Z

The code here looks OK, but I wonder about expected usage - if it's meant to be plugged into prometheus, then it feels fairly unwieldy to do so.

How so?

Currently we have a struct Metrics per subsystem, with collector fn-implementations specific to the subsystem i.e. fn on_foo(u64) and fn on_bar(Hash).

Now the returned Result type of the receiver ( or more specific: Measurements) contains 3 values, which would need to be forwarded to a prometheus type like (in the most common case) prometheus::Histogram with i.e. fn observe(&self, v: f64).

So yes, the resulting evaluation is not perfect but it's also encapsulated and one does not have to differentiate between Ok and Err at the cost of a local let result = rx.await binding (or not, since Error and OutputWithMesasurements both impl Measurable).

There are trade-offs with this impl, but in a discussion /w @eskimor this seemed the most practical one, happy to be enlightened though

Lldenaurois · 2021-09-29T13:30:08Z

node/metered-channel/src/oneshot.rs

+		test_launch(
+			"starve_till_soft_timeout_then_food",
+			|tx| async move {
+				Delay::new(Duration::from_secs(2)).await;


Even though it's just a test, I think it would be better to make this a constant.

They are all used exactly once, and are really only relevant per test case (and each of them is super short), so I don't think that's necessary (though I generally agree with you).

…neshots

…acros

eskimor · 2021-09-29T15:39:48Z

node/network/gossip-support/src/tests.rs

 	tracing::debug!(
 		target: LOG_TARGET,
-		unconnected_authorities = %PrettyAuthorities(unconnected_authorities.iter()),
+		unconnected_authorities = %pretty,


Wasn't that pretty enough? 😛

Can't handle it, too pretty 🎭

drahnr · 2021-09-29T15:45:20Z

bot merge

ghost · 2021-09-29T15:45:24Z

Waiting for commit status.

* master: feat: measured oneshots (#3902) remove `AllSubsystems` and `AllSubsystemsGen` types (#3874) Companion for Substrate#9867 (#3938) Substrate Companion for #9552 (#3834) CI: run disputes tests (#3962) Bump parity-scale-codec from 2.3.0 to 2.3.1 (#3959) approval-voting: populate session cache in advance (#3954) Bump libc from 0.2.102 to 0.2.103 (#3950) fix master (#3955) Docker files chore (#3880) Bump nix from 0.19.1 to 0.20.0 (#3587) remove connected disconnected state, 3rd attempt (#3898) fix flaky chain-selection tests (#3948) Add benchmarking for parachain runtime initializer pallet (#3913) minor chore changes (#3944) disputes: reject single-sided disputes (#3903)

* master: (52 commits) Companion for substrate PR#9890 (#3961) Bump version, tx_version and spec_version in prep for v0.9.11 (#3970) Fix master compilation (#3977) Make most XCM APIs accept an Into<MultiLocation> where MultiLocation is accepted (#3627) fix disputes tests (#3974) Drop availability only for candidates that lose disputes (#3973) revert +1 change to be on the safer side (#3972) paras_inherent: reject only candidates with concluded disputes (#3969) feat: measured oneshots (#3902) remove `AllSubsystems` and `AllSubsystemsGen` types (#3874) Companion for Substrate#9867 (#3938) Substrate Companion for #9552 (#3834) CI: run disputes tests (#3962) Bump parity-scale-codec from 2.3.0 to 2.3.1 (#3959) approval-voting: populate session cache in advance (#3954) Bump libc from 0.2.102 to 0.2.103 (#3950) fix master (#3955) Docker files chore (#3880) Bump nix from 0.19.1 to 0.20.0 (#3587) remove connected disconnected state, 3rd attempt (#3898) ...

overseer: remove mut in connector

4d07c25

drahnr self-assigned this Sep 22, 2021

drahnr added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Sep 22, 2021

drahnr added this to the v0.9.11 milestone Sep 22, 2021

drahnr force-pushed the bernhard-measured-oneshots branch 4 times, most recently from 5f2ad61 to 8a0d1c6 Compare September 22, 2021 12:41

monitored oneshots

27f1e19

drahnr force-pushed the bernhard-measured-oneshots branch from 8a0d1c6 to 27f1e19 Compare September 22, 2021 12:43

fmt

bbdf127

drahnr added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Sep 22, 2021

remove debug log

3717fba

ordian reviewed Sep 22, 2021

View reviewed changes

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

Update node/metered-channel/src/oneshot.rs

1b8f259

Co-authored-by: Andronik Ordian <[email protected]>

ordian reviewed Sep 22, 2021

View reviewed changes

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

drahnr and others added 3 commits September 22, 2021 16:01

Update node/metered-channel/src/oneshot.rs

7a0115b

Co-authored-by: Andronik Ordian <[email protected]>

avoid two Arcs

a7e70b4

add expects

b32bc30

ordian reviewed Sep 22, 2021

View reviewed changes

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

alt impl

f6b0665

eskimor reviewed Sep 27, 2021

View reviewed changes

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

node/metered-channel/src/oneshot.rs Show resolved Hide resolved

node/metered-channel/src/oneshot.rs Outdated Show resolved Hide resolved

drahnr added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Sep 29, 2021

drahnr added 2 commits September 29, 2021 10:48

Merge remote-tracking branch 'origin/master' into bernhard-measured-o…

002c720

…neshots

rework the measured oneshots

920afc8

drahnr added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Sep 29, 2021

Merge remote-tracking branch 'origin/master' into bernhard-measured-o…

291975c

…neshots

eskimor approved these changes Sep 29, 2021

View reviewed changes

rphmeier reviewed Sep 29, 2021

View reviewed changes

node/metered-channel/src/oneshot.rs Show resolved Hide resolved

address review comments, use Measurable where possible

7fc1e7d

rphmeier approved these changes Sep 29, 2021

View reviewed changes

Lldenaurois approved these changes Sep 29, 2021

View reviewed changes

drahnr added 2 commits September 29, 2021 15:49

Merge remote-tracking branch 'origin/master' into bernhard-measured-o…

a180330

…neshots

chore/tracing: move PrettyAuthorities construction out of tracing m…

7572f46

…acros

eskimor reviewed Sep 29, 2021

View reviewed changes

ghost merged commit 71a721e into master Sep 29, 2021

ghost deleted the bernhard-measured-oneshots branch September 29, 2021 16:06

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: measured oneshots #3902

feat: measured oneshots #3902

drahnr commented Sep 22, 2021

drahnr commented Sep 22, 2021 •

edited

Loading

ordian commented Sep 22, 2021 •

edited

Loading

drahnr commented Sep 22, 2021 •

edited

Loading

ordian commented Sep 22, 2021 •

edited

Loading

drahnr commented Sep 23, 2021 •

edited

Loading

rphmeier Sep 29, 2021

drahnr Sep 29, 2021

rphmeier left a comment

drahnr commented Sep 29, 2021 •

edited

Loading

Lldenaurois Sep 29, 2021

drahnr Sep 29, 2021

eskimor Sep 29, 2021

drahnr Sep 29, 2021

drahnr commented Sep 29, 2021

ghost commented Sep 29, 2021

feat: measured oneshots #3902

feat: measured oneshots #3902

Conversation

drahnr commented Sep 22, 2021

drahnr commented Sep 22, 2021 • edited Loading

ordian commented Sep 22, 2021 • edited Loading

drahnr commented Sep 22, 2021 • edited Loading

ordian commented Sep 22, 2021 • edited Loading

drahnr commented Sep 23, 2021 • edited Loading

rphmeier Sep 29, 2021

Choose a reason for hiding this comment

drahnr Sep 29, 2021

Choose a reason for hiding this comment

rphmeier left a comment

Choose a reason for hiding this comment

drahnr commented Sep 29, 2021 • edited Loading

Lldenaurois Sep 29, 2021

Choose a reason for hiding this comment

drahnr Sep 29, 2021

Choose a reason for hiding this comment

eskimor Sep 29, 2021

Choose a reason for hiding this comment

drahnr Sep 29, 2021

Choose a reason for hiding this comment

drahnr commented Sep 29, 2021

ghost commented Sep 29, 2021

drahnr commented Sep 22, 2021 •

edited

Loading

ordian commented Sep 22, 2021 •

edited

Loading

drahnr commented Sep 22, 2021 •

edited

Loading

ordian commented Sep 22, 2021 •

edited

Loading

drahnr commented Sep 23, 2021 •

edited

Loading

drahnr commented Sep 29, 2021 •

edited

Loading