-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Conversation
5f2ad61
to
8a0d1c6
Compare
8a0d1c6
to
27f1e19
Compare
The API usage is as follows: When creating a pair, it has to be given a name as well as two timeouts. // 1. create
let (tx, rx) = metered::oneshot::channel("foo", Duration::from_secs(500), Duration::from_secs(3));
// 2. store the meter
let meter = rx.meter();
// 3.
let val = rx.await; reasoning for this rather providing the metrics as part of the error or as tuple, is compat. This is a rather minimal change and we could add metrics as needed, yet just change all |
Co-authored-by: Andronik Ordian <[email protected]>
Co-authored-by: Andronik Ordian <[email protected]>
The impl looks correct, but a bit too complicated to me. Can we just have a thin abstraction on top of let now = Instant::now();
let result = rx.timeout(HARD_TIMEOUT).await?;
match result {
None => {
error!("Subsystem {name} is unresponsive", name=name);
}
Some(msg) if now.elapsed() > SOFT_TIMEOUT {
warn!("subsystem {name} is slow", name=name);
}
<snip>
} |
Essentially that is what is implemented explicitly within The complexity arises from Another, alternate simplification would be to spawn I very much appreciate the feedback! Note that I did not consider the complexity of the above code that much, I was more concerned how invasive introducing it throughout the code base would be. |
What I'm suggesting is implementing a function (or a struct wrapper for more type-safety) with the code above (modulo bikeshed on naming) async fn timed_await(rx: oneshot::Receiver<T>) -> Result<T> {...} and replacing But maybe I'm missing something? |
What you describe would be the functional approach and at the first glance you indeed obtain similar measurements while avoiding some complexity. With the The complexity on the other handside allows to better measure the actual time between the first poll and the send rather than the time of the receiver end blocking. In the common case these will be almost identical, but I'd rather not simplify here, especially when one looks at grafana dashboards occluding the odd deltas. This also allows to simply add a bunch of extra measurements as needed (what's the delay between receiver creation and first poll, the delta between the true send and the true conclusion, so machine overload conditions will not show as wrongly blocking durations, but these are just a few details that will eventually become important in our grafana views). I'd prefer to merge this with the tad of added complexity rather than measuring almost the right thing :) - in fact this impl allows to measure both as needed. |
) | ||
} | ||
|
||
#[allow(missing_docs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this allow(missing_docs)
? the measurements
function is also without docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in particular has missing docs, because iff the error is not self describing, there is something wrong already. The error annotation should be enough. This is a pattern we have in more places throughout the codebase, limited to Error
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here looks OK, but I wonder about expected usage - if it's meant to be plugged into prometheus, then it feels fairly unwieldy to do so.
How so? Currently we have a Now the returned So yes, the resulting evaluation is not perfect but it's also encapsulated and one does not have to differentiate between There are trade-offs with this impl, but in a discussion /w @eskimor this seemed the most practical one, happy to be enlightened though |
test_launch( | ||
"starve_till_soft_timeout_then_food", | ||
|tx| async move { | ||
Delay::new(Duration::from_secs(2)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it's just a test, I think it would be better to make this a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all used exactly once, and are really only relevant per test case (and each of them is super short), so I don't think that's necessary (though I generally agree with you).
tracing::debug!( | ||
target: LOG_TARGET, | ||
unconnected_authorities = %PrettyAuthorities(unconnected_authorities.iter()), | ||
unconnected_authorities = %pretty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't that pretty enough? 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't handle it, too pretty 🎭
bot merge |
Waiting for commit status. |
* master: feat: measured oneshots (#3902) remove `AllSubsystems` and `AllSubsystemsGen` types (#3874) Companion for Substrate#9867 (#3938) Substrate Companion for #9552 (#3834) CI: run disputes tests (#3962) Bump parity-scale-codec from 2.3.0 to 2.3.1 (#3959) approval-voting: populate session cache in advance (#3954) Bump libc from 0.2.102 to 0.2.103 (#3950) fix master (#3955) Docker files chore (#3880) Bump nix from 0.19.1 to 0.20.0 (#3587) remove connected disconnected state, 3rd attempt (#3898) fix flaky chain-selection tests (#3948) Add benchmarking for parachain runtime initializer pallet (#3913) minor chore changes (#3944) disputes: reject single-sided disputes (#3903)
* master: (52 commits) Companion for substrate PR#9890 (#3961) Bump version, tx_version and spec_version in prep for v0.9.11 (#3970) Fix master compilation (#3977) Make most XCM APIs accept an Into<MultiLocation> where MultiLocation is accepted (#3627) fix disputes tests (#3974) Drop availability only for candidates that lose disputes (#3973) revert +1 change to be on the safer side (#3972) paras_inherent: reject only candidates with concluded disputes (#3969) feat: measured oneshots (#3902) remove `AllSubsystems` and `AllSubsystemsGen` types (#3874) Companion for Substrate#9867 (#3938) Substrate Companion for #9552 (#3834) CI: run disputes tests (#3962) Bump parity-scale-codec from 2.3.0 to 2.3.1 (#3959) approval-voting: populate session cache in advance (#3954) Bump libc from 0.2.102 to 0.2.103 (#3950) fix master (#3955) Docker files chore (#3880) Bump nix from 0.19.1 to 0.20.0 (#3587) remove connected disconnected state, 3rd attempt (#3898) ...
Introduces oneshots that measure the delay between creation, first poll and completion or error (distinguishing between hard timeout and cancellation).
Also prints a warning based on a set soft timeout.
Closes #3648