Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsystem-bench: add emulated network latency for messages and requests coming from local validator #4611

Open
alindima opened this issue May 28, 2024 · 8 comments · May be fixed by #4627
Open
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network. T10-tests This PR/Issue is related to tests.

Comments

@alindima
Copy link
Contributor

AFAICT from the code, we're only emulating peer latency when sending a message from an emulated peer to the local node.
We should add latency for sending messages the other way around also (when sending a message from the local node to an emulated peer).

I validated that this is indeed the case by adding a 10s mean latency to the availability recovery bench and there wasn't any impact on performance

@alindima alindima added T8-polkadot This PR/Issue is related to/affects the Polkadot network. T10-tests This PR/Issue is related to tests. labels May 28, 2024
@AndreiEres
Copy link
Contributor

Thanks for the good observation. It's true, we probably should've used the latency in both directions.

Could you explain what we can achieve with higher latency or just latency? It really makes the simulation more realistic. But at the same time it just randomly increases the time inside the calls while awaiting and distorts the result. Considering how random our results are already I would rather compromise the reality for a more stable result.

@alindima
Copy link
Contributor Author

It'll allow us to model a real network more closely, which is desireable for finding limitations in our subsystems. For example how many recoveries can we actually process in a block in a realistic case (the same reason we have the rate limiters and connectivity percentages)

I would've needed this for example to do some testing for measuring the potential impact of: #3127. I ended up manually hacking something to add a 100ms sleep on each request

@alindima
Copy link
Contributor Author

Considering how random our results are already I would rather compromise the reality for a more stable result.

Maybe this makes sense for CI regression checks indeed 👍🏻 but we need the option to add latency for manual simulations

@AndreiEres
Copy link
Contributor

Then it's worth adding. As I understand it, you'll be using it during development so we can always disable the latency in regression tests.

@sandreim
Copy link
Contributor

sandreim commented May 28, 2024

I envisioned the peer latency to be the RTT latency so this case should be covered. Looking at the code, it seems that in HandleNetworkMessage::handle() implementation we directly send back the response. We should send back the response after latency is applied, so moving this to a closure might fix the problem, but maybe there is a better way to add the lantecy by using the proxy.

@alindima
Copy link
Contributor Author

Looking at the code, it seems that in HandleNetworkMessage::handle() implementation we directly send back the response. We should send back the response after latency is applied, so moving this to a closure might fix the problem, but maybe there is a better way to add the lantecy by using the proxy.

Yeah, I think this should be handled by the proxy, not by every HandleNetworkMessage implementation (which would lead to duplication and we'll probably forget to do it)

@alindima
Copy link
Contributor Author

also, for notification protocols, they won't always perform a round-trip right? since they're not request-response. In this case we should also have latency for notifications sent by the local validator (AFAICT we don't)

@AndreiEres
Copy link
Contributor

AndreiEres commented May 29, 2024

@alindima try to run with this one #4627 please

But if this PR is correct, it'll shift all our baselines if latency is used in a regression test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network. T10-tests This PR/Issue is related to tests.
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

3 participants