Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: increase memory for tests that see Raft OOMs #87809

Closed
erikgrinaker opened this issue Sep 12, 2022 · 1 comment · Fixed by #88444
Closed

roachtest: increase memory for tests that see Raft OOMs #87809

erikgrinaker opened this issue Sep 12, 2022 · 1 comment · Fixed by #88444
Assignees
Labels
A-testing Testing tools and infrastructure O-qa

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 12, 2022

The Raft layer does not have memory budgeting -- in particular, it needs a global memory budget shared across all Raft groups. This makes it vulnerable to OOMs if many Raft groups are seeing concurrent memory-intensive operations, typically large messages like SST ingestions. The work to resolve this is tracked in:

Until this work gets prioritized, we should increase the memory of the failing roachtests to avoid test flakes. This includes, but is not limited to:

Jira issue: CRDB-19547

@erikgrinaker erikgrinaker added O-qa A-testing Testing tools and infrastructure T-kv-replication labels Sep 12, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 12, 2022

cc @cockroachdb/replication

craig bot pushed a commit that referenced this issue Sep 22, 2022
88346: roachtest: use n1-standard for 16-core GCE machines r=srosenberg a=erikgrinaker

Roachtest used `n1-highcpu` machines at 16 cores and beyond. However, this causes a memory cliff, because a `n1-standard-8` machine has ~30 GB memory (3.75 GB per core), but a `n1-highcpu-16` machine only has 14 GB memory (0.9 GB per core).

This patch makes 16-core machines use `n1-standard` as well, with 60 GB memory, and only switches to `n1-highcpu` at 32 cores (with 29 GB memory).

Touches #87809.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
@craig craig bot closed this as completed in 30ba7c3 Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure O-qa
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant