Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/colcontainer: TestExternalDistinct leaks Queues, file descriptors #81413

Closed
jbowens opened this issue May 17, 2022 · 3 comments · Fixed by #81419
Closed

sql/colcontainer: TestExternalDistinct leaks Queues, file descriptors #81413

jbowens opened this issue May 17, 2022 · 3 comments · Fixed by #81419
Assignees
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented May 17, 2022

In #81389, I originally moved where Pebble's disk-health checking is layered onto the filesystem. Disk-health checking monitors for excessively slow writes, crashing the process if writes appear to be stalled in the kernel. The changes in #81389 caused disk-health checking to be used in TestExternalDistinct. TestExternalDistinct began to fail with leaked goroutines in the disk-health checking. I dug into it a bit, and traced it to a diskQueue that is never Closed, causing its writeFile to never be closed.

For now I should be able to refactor #81389 to continue omitting disk-health checking in this code path and unblock the vendor bump, so this isn't a blocker for me.

Jira issue: CRDB-15202

@jbowens jbowens added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-execution Relating to SQL execution. T-sql-queries SQL Queries Team labels May 17, 2022
jbowens added a commit to jbowens/cockroach that referenced this issue May 17, 2022
This commit includes non-trivial changes to account for the change
in the `vfs.WithDiskHealthChecks` function signature change.
Additionally, this PR uncovered cockroachdb#81413 and necessitated some changes to
work around it.

```
e567fec8 db: ensure Open closes opened directories on error
b8c9a560 internal/metamorphic: overwrite unused bounds buffers
7c5f0cbb db: copy user-provided bounds and optimize unchanged bounds
f8897076 *: Add IterOption to optionally read L6 filter blocks.
d79f9617 vfs: extend disk-health checking to filesystem metadata operations
5ae21746 db: remove newRangeKeyIter closure
6d975489 db: add BenchmarkIteratorScan
782d102e sstable: fix invariant check for sstable size estimation
738a7f0b db: fix NewIter regression resulting in extra memtable levels
37558663 *: Use keyspan.LevelIter for rangedels in compactions
e6c60c71 db: use sublevel level iters for all compactions out of L0
d8f63bef db: extend documentation on ingested file overlap; improve test cases
b9e970a8 internal/keyspan: correct and document MergingIter key stability
498177bb internal/keyspan: collapse fragmentBoundIterator into MergingIter
```

Release note (bug fix): Fix a gap in disk-stall detection. Previously,
disk stalls during filesystem metadata operations could go undetected,
inducing deadlocks.  Now stalls during these types of operations will
correctly fatal the process.
jbowens added a commit to jbowens/cockroach that referenced this issue May 17, 2022
This commit includes non-trivial changes to account for the change
in the `vfs.WithDiskHealthChecks` function signature change.
Additionally, this PR uncovered cockroachdb#81413 and necessitated some changes to
work around it.

```
e567fec8 db: ensure Open closes opened directories on error
b8c9a560 internal/metamorphic: overwrite unused bounds buffers
7c5f0cbb db: copy user-provided bounds and optimize unchanged bounds
f8897076 *: Add IterOption to optionally read L6 filter blocks.
d79f9617 vfs: extend disk-health checking to filesystem metadata operations
5ae21746 db: remove newRangeKeyIter closure
6d975489 db: add BenchmarkIteratorScan
782d102e sstable: fix invariant check for sstable size estimation
738a7f0b db: fix NewIter regression resulting in extra memtable levels
37558663 *: Use keyspan.LevelIter for rangedels in compactions
e6c60c71 db: use sublevel level iters for all compactions out of L0
d8f63bef db: extend documentation on ingested file overlap; improve test cases
b9e970a8 internal/keyspan: correct and document MergingIter key stability
498177bb internal/keyspan: collapse fragmentBoundIterator into MergingIter
```

Release note (bug fix): Fix a gap in disk-stall detection. Previously,
disk stalls during filesystem metadata operations could go undetected,
inducing deadlocks.  Now stalls during these types of operations will
correctly fatal the process.
jbowens added a commit to jbowens/cockroach that referenced this issue May 17, 2022
This commit includes non-trivial changes to account for the change
in the `vfs.WithDiskHealthChecks` function signature change.
Additionally, this PR uncovered cockroachdb#81413 and necessitated some changes to
work around it.

```
e567fec8 db: ensure Open closes opened directories on error
b8c9a560 internal/metamorphic: overwrite unused bounds buffers
7c5f0cbb db: copy user-provided bounds and optimize unchanged bounds
f8897076 *: Add IterOption to optionally read L6 filter blocks.
d79f9617 vfs: extend disk-health checking to filesystem metadata operations
5ae21746 db: remove newRangeKeyIter closure
6d975489 db: add BenchmarkIteratorScan
782d102e sstable: fix invariant check for sstable size estimation
738a7f0b db: fix NewIter regression resulting in extra memtable levels
37558663 *: Use keyspan.LevelIter for rangedels in compactions
e6c60c71 db: use sublevel level iters for all compactions out of L0
d8f63bef db: extend documentation on ingested file overlap; improve test cases
b9e970a8 internal/keyspan: correct and document MergingIter key stability
498177bb internal/keyspan: collapse fragmentBoundIterator into MergingIter
```

Release note (bug fix): Fix a gap in disk-stall detection. Previously,
disk stalls during filesystem metadata operations could go undetected,
inducing deadlocks.  Now stalls during these types of operations will
correctly fatal the process.
@yuzefovich yuzefovich self-assigned this May 18, 2022
@jbowens
Copy link
Collaborator Author

jbowens commented May 18, 2022

With the changes from #81419, I still saw a couple file descriptors leak in TestSQLLogic in CI: #81389 (comment)

For now, I'm going to continue disable disk-health checking for these in-memory unit test engines, so that we can merge this Pebble bump.

@yuzefovich
Copy link
Member

Thanks Jackson! Could you please drop a quick comment on how to enable those checks back so that I could debug?

@yuzefovich
Copy link
Member

Oh, I see, it's just storage.DisableFilesystemMiddlewareTODO knob.

@craig craig bot closed this as completed in 1e1ff14 May 18, 2022
@craig craig bot closed this as completed in #81419 May 18, 2022
craig bot pushed a commit that referenced this issue May 20, 2022
81389: vendor: bump Pebble to e567fec84c6e r=jbowens a=jbowens

This commit includes non-trivial changes to account for the change
in the `vfs.WithDiskHealthChecks` function signature change.
Additionally, this PR uncovered #81413 and necessitated some changes to
work around it.

```
e567fec8 db: ensure Open closes opened directories on error
b8c9a560 internal/metamorphic: overwrite unused bounds buffers
7c5f0cbb db: copy user-provided bounds and optimize unchanged bounds
f8897076 *: Add IterOption to optionally read L6 filter blocks.
d79f9617 vfs: extend disk-health checking to filesystem metadata operations
5ae21746 db: remove newRangeKeyIter closure
6d975489 db: add BenchmarkIteratorScan
782d102e sstable: fix invariant check for sstable size estimation
738a7f0b db: fix NewIter regression resulting in extra memtable levels
37558663 *: Use keyspan.LevelIter for rangedels in compactions
e6c60c71 db: use sublevel level iters for all compactions out of L0
d8f63bef db: extend documentation on ingested file overlap; improve test cases
b9e970a8 internal/keyspan: correct and document MergingIter key stability
498177bb internal/keyspan: collapse fragmentBoundIterator into MergingIter
```

Release note (bug fix): Fix a gap in disk-stall detection. Previously,
disk stalls during filesystem metadata operations could go undetected,
inducing deadlocks.  Now stalls during these types of operations will
correctly fatal the process.

81535: colmem: improve the behavior of ResetMaybeReallocate r=yuzefovich a=yuzefovich

Previously, `Allocator.ResetMaybeReallocate` would always grow the
capacity of the batch until `coldata.BatchSize()` (unless the memory
limit has been exceeded). This behavior allows us to have batches with
dynamic size when we don't know how many rows we need to process (for
example, in the `cFetcher` we start out with 1 row, then grow it
exponentially - 2, 4, 8, etc). However, in some cases we know exactly
how many rows we want to include into the batch, so that behavior can
result in us re-allocating a batch needlessly when the old batch already
have enough capacity.

This commit improves the situation by adding a knob to indicate that if
the desired capacity is satisfied by the old batch, then it should not
be re-allocated. All callers have been audited accordingly.

Release note: None

81552: roachtest: fix pgjdbc and typeorm r=otan a=rafiss

fixes #81515
fixes #81431

See individual commits.

81558: sql,stmtdiagnostics: remove some version gates r=yuzefovich a=yuzefovich

This commit addresses several TODOs with my name on them about removing
the version gates, mostly around the conditional statement diagnostics
introduced in 22.1 cycle.

Release note: None

81579: roachtest: allow TC_BUILDTYPE_ID to be accessible by Docker r=rickystewart a=renatolabs

In #81103, the process of generating TeamCity links in test failure
reports started relying on the `TC_BUILDTYPE_ID` environment
variable. While that variable was added to TeamCity builds, it was not
being passed down to Docker where the tests actually run. As a result,
links generated by the GitHub poster were broken (see, for example, #81572).

This commit makes `TC_BUILDTYPE_ID` accessible by Docker for every
build that was already passing `TC_BUILD_BRANCH`. This should be
sufficient to cover all existing cases and more, in case having
access to this variable becomes useful in the future.

Release note: None

Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants