-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: kv/splits/nodes=3/quiesce=true failed #85112
Comments
Artifacts are lost. Is this a resurgence of #82334 (closed by stale bot)? Going off the analysis there, I'm guessing this is an OOM. @kvoli, let's add a sync.Pool around this for now and call it a day. To me this is as likely a culprit as any other. cockroach/pkg/kv/kvserver/replicastats/replica_stats.go Lines 93 to 99 in 4f96b1b
|
Better yet, see if you can avoid the heap allocations altogether. |
I reduced the number of replica stats structs slightly in #85178, by 2. I'll take a look and rethink/rework the memory alloc for this. |
It's possible that PR is why we're no longer seeing failures here in this roachtest. But yes, reducing allocs anyway seems a good idea. |
This patch removes some unused fields within the replica stats object. It also opts to allocate all the memory needed upfront for a replica stats object for better cache locality and less GC overhead. resolves cockroachdb#85112 Release justification: low risk, lowers memory footprint to avoid oom. Release note: None
This patch removes some unused fields within the replica stats object. It also opts to allocate all the memory needed upfront for a replica stats object for better cache locality and less GC overhead. resolves cockroachdb#85112 Release justification: low risk, lowers memory footprint to avoid oom. Release note: None
…87158 85354: sql: notices for NotVisible Indexes r=wenyihu6 a=wenyihu6 Optimizer now supports creating invisible indexes after this [PR](#85794). An important use case for not visible indexes is to test the behaviour of dropping an index by marking the index invisible. However, there are certain cases where users cannot expect dropping an index to behave exactly the same as marking an index invisible. More specifically, NotVisible indexes may still be used to police unique or foreign key constraint check behind the scene. In those cases, dropping the index might behave different from marking the index invisible. Prior to this commit, users do not know about this without reading the documentation. This commit adds some user-friendly notices when users are dropping or changing a not visible index that might be helpful for constraint check. There are two cases where we are giving this notice: 1. if this index is unique. 2. if this index is on child table and may help with FK check. More details on how this decision was made in docs/RFCS/20220628_invisible_index.md. Assists: #72576 See also: #85794 Release justification: low risk to the existing functionality; this commit just adds notices. Release note: none 86592: kvserver: rework memory allocation in replicastats r=kvoli a=kvoli This patch removes some unused fields within the replica stats object. It also opts to allocate all the memory needed upfront for a replica stats object for better cache locality and less GC overhead. This patch also removes locality tracking for the other throughput trackers to reduce per-replica memory footprint. resolves #85112 Release justification: low risk, lowers memory footprint to avoid oom. Release note: None 87024: sql: Prevent primary region being same as secondary region r=rafiss a=e-mbrown fixes #86879 We found that the primary region could be assigned the same region as the secondary region. This commit adds an error to prevent that. Release justification: Low risk high benefit change to existing functionality Release note: None 87110: ui: fixes to high contention copy in insight workload pages r=ericharmeling a=ericharmeling Previously, the High Contention insight type was labeled "High Contention Time", and the waiting transactions list was labeled in the incorrect tense. This commit fixes those typos. Release justification: bug fix Release note: None 87135: build: remove newly-added node_modules/ trees in ui-maintainer-clean r=rickystewart a=sjbarag A few recent features [1, 2] introduced new node_modules/ trees for dependencies, but didn't update the ui-maintainer-clean Make target to remove them. This allowed those directories to leak between TeamCity builds with Docker user permissions, preventing a `yarn install` in those packages from properly laying out a `node_modules/.bin` directory for executables like `tsc`. Remove the recently-introduced `node_modules/` directories as part of `make ui-maintainer-clean`, to restore a clean state between jobs. [1] d28c072 (ui: add eslint-plugin-crdb package with custom eslint rules, 2022-05-27) [2] c58279d (ui: reintroduce end-to-end UI tests with cypress, 2022-08-12) Release justification: Non-production code changes 87149: sql: clean up physical planning for system tenant r=yuzefovich a=yuzefovich This commit audits a couple of methods around the health and version of DistSQL nodes that are used only for the system tenant to make that more explicit. Additionally, it unexports `NodeStatuses` map from the planning context as well as removes some unnecessary short-circuiting behavior around checking the node health and version (it was unnecessary because we already short-circuit in `checkInstanceHealthAndVersionSystem`). Release justification: low-risk cleanup. Release note: None 87153: ui: ux improvements on stmt details page r=maryliag a=maryliag This commit adds a few improvements and bug fixes: - Handles the case where we hit a timeout on statement details, so it doesn't crash anymore and you can still see the time picker to be able to select a new time interval. - Updates the error message, to clarify it was a timeout error and increase the timeout from 30s to 30m on the details endpoint. Fixes #78979 - Updates the last error for statement details with the proper value, which previously was using the error for all statements endpoint, instead of the specific for that fingerprint id. - Adds a message when page takes longer to load. - Uses a proper count formatting for execution count. Release justification: bug fixes and smaller improvements Release note (ui change): Proper formatting of execution count under Statement Details page. Increase timeout for Statement Details page and shows proper timeout error when it happens, no longer crashing the page. 87155: github-post: allow for finding the test in a parent directory of the pkg r=srosenberg,rail a=rickystewart In some cases the Bazel test runner "incorrectly" reports the package path for tests. For example, we have [issues](#85376) where the name of the test is reported as `pkg/.../package/package_test` rather than `pkg/.../package` as we might expect. I suspect this is confusing `github-post` when it tries to find tests in the `package_test` directory rather than the `package` directory. We address this by allowing `github-post` to search up the directory tree for the test rather than expecting it to be in one particular directory. Also update a repro command to use `dev test` rather than `make stressrace`. Closes #85420. Release justification: Non-production code changes Release note: None 87156: ci: disable sharding in random syntax tests r=srosenberg a=rickystewart The different shards were trampling each other's test.json.txt, preventing failures from being reported accurately. Release justification: Non-production code changes Release note: None 87158: sql: clean up node dialer fields r=yuzefovich a=yuzefovich This commit removes no longer used `nodeDialer` field (for SQL - KV communication) as well as renames some of the similarly named fields to `podNodeDialer` to indicate that its only a SQL - SQL dialer. Release justification: low-risk cleanup. Release note: None Co-authored-by: wenyihu3 <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: e-mbrown <[email protected]> Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Sean Barag <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
roachtest.kv/splits/nodes=3/quiesce=true failed with artifacts on master @ 0f100f3a09246bc80ef2c3119e4ec1c24ef36801:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-18046
Epic CRDB-18656
The text was updated successfully, but these errors were encountered: