-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multitenant: demo startup performance is poor with multiple tenants and simulated latencies #76305
Comments
After doing a bit more digging, and chatting with @ajwerner, this is expected. At least one reason why we're seeing poor performance here is because we're performing startup migration for each of the new SQL servers that are being created. These startup migrations block the SQL server from starting up until they're completed. Additionally, it's a known issue (#73813) that these startup migrations are expensive, especially in cases where the cluster is widely distributed. This is an issue that we'll likely need to address before we can enable multi-region Serverless. @andy-kimball @vy-ton FYI. |
What do these startup migrations do? And why do they need to run on a brand new cluster? |
Cluster creation and tenant setup is chatty. That's an okay thing: we don't really care about cluster creation being that slow in general. In the case of demo when we want to simulate latency and use tenants, it was particularly painful. Starting the 9 tenants would take many minutes. This patch alleviates this problem by keeping latency between the simulated nodes low until just before we pass control to the user. Fixes cockroachdb#76305 Release note (cli change): `cockroach demo --global` will now start up more quickly. The latency which will be injected is not injected until after the initial cluster is set up internally.
91462: kvserver: Fix performance regression due to new call to collectSpansRead r=KaiSun314 a=KaiSun314 Fixes: #91374 Fixes: #91723 When we incorporated the use of response data in the load-based splitter, we called collectSpansRead, which is allocation heavy and computationally expensive, resulting in a performance regression. To address this, this patch performs 3 optimizations: 1. Remove the call to collectSpansRead; instead, add a custom function to iterate over the batch of requests / responses and calculate the true spans 2. Instead of constructing a *spanset.SpanSet and finding the union of spans (which uses O(batch_size) memory), we directly compute the union of spans while iterating over the batch resulting in only O(1) memory used 3. Lazily compute the union of true spans only when it is truly needed i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter has been initialized Cherry-picking this commit to the commit right before we incorporated response data in the load-based splitter (068845f) and running ``` ~/benchdiff/benchdiff --old=068845ff72315f8b64f0e930c17c48f078203bc4 --new=abf61ce75c47e16bc39ed0e714f2e46f1d97eb7c --count=20 --post-checkout='./dev generate go' --run='KV/././rows=1$$' ./pkg/sql/tests ``` the output is: <img width="909" alt="More Efficient Response Data Benchdiff" src="https://user-images.githubusercontent.com/28762332/200678800-d75240f9-a275-40cf-85f2-201aadca0355.png"> Release note: None 92231: cli,democluster: defer simulated latency until after cluster setup r=ajwerner a=ajwerner ### democluster,serverutils/regionlatency,rpc: extract code for simulating latency We'll want to leverage these helpers in some tests to measure behavior under simulated latency. ### cli,democluster: defer simulated latency until after cluster setup Cluster creation and tenant setup is chatty. That's an okay thing: we don't really care about cluster creation being that slow in general. In the case of demo when we want to simulate latency and use tenants, it was particularly painful. Starting the 9 tenants would take many minutes. This patch alleviates this problem by keeping latency between the simulated nodes low until just before we pass control to the user. Fixes #76305 Release note (cli change): `cockroach demo --global` will now start up more quickly. The latency which will be injected is not injected until after the initial cluster is set up internally. Co-authored-by: Kai Sun <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
cockroach demo --global --nodes 9 --empty
takes several minutes to startup. When multi-tenancy is disabled, it takes only a few seconds. It's not clear what's causing this poor performance, but a quick peek into the logs shows that each tenant takes between 20 and 60 seconds to start.More investigation is required.
Epic: CRDB-18596
Jira issue: CRDB-13070
The text was updated successfully, but these errors were encountered: