Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/logictest: use of a closed engine's FS can result in a goroutine leak; TestLogic flaky #84437

Closed
nicktrav opened this issue Jul 14, 2022 · 6 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@nicktrav
Copy link
Collaborator

nicktrav commented Jul 14, 2022

Describe the problem

A Pebble change in #81389 uncovered a handful of tests where an engine's filesystem (FS) is used after it is closed, which can result in goroutine leaks. This can increase the test flake rate.

There are two main paths to addressing. Option A - Eliminate the use of the FS after it is closed (preferable, but unclear how much refactoring / yak shaving is required). Option B - make use of the DisableFilesystemMiddlewareTODO engine option as a temporary workaround to bypass the code in Pebble that tickles the leak.

Related issues / comments:

Jira issue: CRDB-17660

@nicktrav nicktrav added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 14, 2022
@nicktrav
Copy link
Collaborator Author

@yuzefovich - are you able to help triage this? I assume one of the {A,T}-sql-* labels?

@nicktrav nicktrav reopened this Jul 14, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 14, 2022
@yuzefovich
Copy link
Member

Linking one of the flakes from the other issue. Probably the action item here is to first logic tests with -v --show-sql (possibly under stress) so that we know which of the logic test files leaks the goroutine.

@rytaft
Copy link
Collaborator

rytaft commented Jul 19, 2022

Action item: reproduce more deterministically to understand where the leak is coming from

@nicktrav nicktrav changed the title sql/logictest: use of a closed engine's FS makes can result in a goroutine leak; TestLogic flaky sql/logictest: use of a closed engine's FS can result in a goroutine leak; TestLogic flaky Jul 19, 2022
@yuzefovich
Copy link
Member

Ran into a leak in one of the CI builds on TestTraceAnalyzer (I don't think that the changes on my branch are to blame).

@rytaft
Copy link
Collaborator

rytaft commented Aug 23, 2022

@nicktrav are the disk health checks still running? We haven't seen a failure recently. Can we close this?

@nicktrav
Copy link
Collaborator Author

@rytaft - I think we probably can, yes. We still have DisableFilesystemMiddlewareTODO that's suppressing some failures elsewhere. I'm going to remove that in a branch now and see what else fails. I will create separate tickets, if required.

nicktrav added a commit to nicktrav/cockroach that referenced this issue Aug 23, 2022
The `DisableFilesystemMiddlewareTODO` Pebble option was added as a
temporary workaround to suppress errors arising from file descriptor
leaks in unit tests, when the disk-health checking filesystem is used.

Remove the temporary config option.

Touches cockroachdb#84437.

Release justification: Low risk, high benefit changes to existing
functionality.

Release note: None.
craig bot pushed a commit that referenced this issue Aug 25, 2022
86692: vendor: bump Pebble to 4cc0974fdade r=nicktrav,msbutler a=jbowens

```
4cc0974f *: propagate pointer to InternalIteratorStats
9c2bc013 metamorphic: print out correct options name on test failure
b83dd2c7 *: fix error handling in simpleLevelIter
70943038 *: use Equal for more efficient equality comparisons
7030aaab metamorphic: fix iterSeekLTWithlimit
89408b27 tool: support range keys
```

Release note: None
Release justification: Non-production code changes, low-risk
high-benefit changes to existing functionality and bug fixes for new
functionality.

86704: storage: remove `DisableFilesystemMiddlewareTODO` r=jbowens a=nicktrav

The `DisableFilesystemMiddlewareTODO` Pebble option was added as a
temporary workaround to suppress errors arising from file descriptor
leaks in unit tests, when the disk-health checking filesystem is used.

Remove the temporary config option.

Touches #84437.

Release justification: Low risk, high benefit changes to existing
functionality.

Release note: None.

Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

3 participants