-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: large-disk cluster has poor performance post-restart #56876
Comments
Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I left this cluster for a bit and eventually all but one node was oom killed. The logs showed persistent liveness errors and lots of duplicate connection gossip errors:
|
This maybe the same issue with the raft scheduler that we have seen here #56851 given the number of ranges. |
#56943 is introducing a new |
We have marked this issue as stale because it has been inactive for |
I'll close this out, and we should revisit node density as a part of the planned scalability limits work (cc @williamkulju). I don't think this one data point from 3 years ago is providing much context, and we'll hopefully be able to identify specific concrete obstacles to high node density in that work. |
Reproduction steps
Create a cluster with 10 TB disks, eg:
Import a large bank dataset. (This takes ~28 hours.)
Run the bank workload:
Throughput is around ~40 ops/sec. See https://docs.google.com/document/d/1rfWNGFZ6gulKqb6BMXeMzq9GkMgdVfXnC4SJ3FM3ilE/edit?usp=sharing
Jira issue: CRDB-2897
The text was updated successfully, but these errors were encountered: