-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: double machine size for 8TB restore test #108350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to change the test name, which will lose the roachperf history:
https://roachperf.crdb.dev/?filter=&view=restore%2Ftpce%2F8TB%2Faws%2Fnodes%3D10%2Fcpus%3D8&tab=aws
Can/should we request more memory without bumping the machine size? See spec.High
to request high-memory nodes.
Will defer to DR here, requested a review from them.
481429b
to
e546122
Compare
@erikgrinaker Good point. Changed the machine type to high-memory. |
May want to add a comment as well saying why we request highmem, and link to the Raft OOM issues. |
The test occasionally OOMs in the raft stack. Reduce the noise until we fix the underlying causes of such OOMs. Similar tests in GCP use n2 machines with 4 GB per CPU, and the AWS test before this changes used a 2 GB/CPU machine: ``` $ roachtest list restore | grep TB restore/tpce/32TB/aws/nodes=15/cpus=16 [disaster-recovery] restore/tpce/32TB/inc-count=400/aws/nodes=15/cpus=16 [disaster-recovery] restore/tpce/32TB/inc-count=400/gce/nodes=15/cpus=16 [disaster-recovery] restore/tpce/8TB/aws/nodes=10/cpus=8 [disaster-recovery] ``` After this commit, the 8TB restore test uses the m6i.2xlarge machine with 32 GB memory instead of c6i.2xlarge with 16 GB. Now all O(TB) restore tests use machines with at least 32 GB of memory. Epic: none Release note: none
e546122
to
f93d7e9
Compare
@erikgrinaker Added the comment. I noticed that the test still got renamed from |
I see, I'll defer to DR on how important this is for them. |
I've added a discussion topic for a DR meeting tomorrow. Could we wait to merge this until then? |
Going to revert this as per discussion with the team. |
The test occasionally OOMs in the raft stack. Reduce the noise until we fix the
underlying causes of such OOMs.
Similar tests in GCP use n2 machines with 4 GB per CPU, and the AWS test before
this changes used a 2 GB/CPU machine:
After this commit, all O(TB) restore tests use machines with at least 32 GB of
memory.
Touches #106496
Epic: none
Release note: none