release-21.2: colflow: release disk resources in hash router in all cases #81555

blathers-crl · 2022-05-19T21:55:58Z

Backport 1/1 commits from #81491 on behalf of @yuzefovich.

/cc @cockroachdb/release

Previously, it was possible for the disk-backed spilling queue used
by the hash router outputs to not be closed when the hash router exited.
Namely, this could occur if the router output was not fully exhausted
(i.e. it could still produce more batches, but the consumer of the
router output was satisfied and called DrainMeta). In such a scenario,
routerOutput.closeLocked was never called because a zero-length batch
was never given to addBatch nor the output was canceled due to an
error. The flow cleanup also didn't save us because the router outputs
are not added into ToClose slice.

The bug is now fixed by closing the router output in DrainMeta. This
behavior is acceptable because the caller is not interested in any more
data, and closing the output can be done multiple times (it is a no-op
on all calls except for the first one). There is no regression test
since it's quite tricky to come up with given that the behavior of
router outputs is non-deterministic, and I don't think it's worth
introducing special knobs inside of DrainMeta / Next for this.

The impact of not closing the spilling queue is that it might lead to
leaking a file descriptor until the node restarts. Although the
temporary directory is deleted on the flow cleanup, the bug would result
in a leak of the disk space which is also "fixed" by the node restarts.

Fixes: #81490.

Release note: None

Release justification: bug fix.

Previously, it was possible for the disk-backed spilling queue used by the hash router outputs to not be closed when the hash router exited. Namely, this could occur if the router output was not fully exhausted (i.e. it could still produce more batches, but the consumer of the router output was satisfied and called `DrainMeta`). In such a scenario, `routerOutput.closeLocked` was never called because a zero-length batch was never given to `addBatch` nor the output was canceled due to an error. The flow cleanup also didn't save us because the router outputs are not added into `ToClose` slice. The bug is now fixed by closing the router output in `DrainMeta`. This behavior is acceptable because the caller is not interested in any more data, and closing the output can be done multiple times (it is a no-op on all calls except for the first one). There is no regression test since it's quite tricky to come up with given that the behavior of router outputs is non-deterministic, and I don't think it's worth introducing special knobs inside of `DrainMeta` / `Next` for this. The impact of not closing the spilling queue is that it might lead to leaking a file descriptor until the node restarts. Although the temporary directory is deleted on the flow cleanup, the bug would result in a leak of the disk space which is also "fixed" by the node restarts. Release note: None

blathers-crl · 2022-05-19T21:56:00Z

cockroach-teamcity · 2022-05-19T21:56:09Z

This change is

michae2

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)

blathers-crl bot force-pushed the blathers/backport-release-21.2-81491 branch from 131794d to 39944c3 Compare May 19, 2022 21:55

blathers-crl bot requested review from cucaroach and michae2 May 19, 2022 21:56

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 19, 2022

blathers-crl bot assigned yuzefovich May 19, 2022

michae2 approved these changes May 19, 2022

View reviewed changes

cucaroach approved these changes May 20, 2022

View reviewed changes

yuzefovich merged commit 4e40ba1 into release-21.2 May 22, 2022

yuzefovich deleted the blathers/backport-release-21.2-81491 branch May 22, 2022 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-21.2: colflow: release disk resources in hash router in all cases #81555

release-21.2: colflow: release disk resources in hash router in all cases #81555

blathers-crl bot commented May 19, 2022 •

edited by yuzefovich

Loading

blathers-crl bot commented May 19, 2022 •

edited by yuzefovich

Loading

cockroach-teamcity commented May 19, 2022

michae2 left a comment

release-21.2: colflow: release disk resources in hash router in all cases #81555

release-21.2: colflow: release disk resources in hash router in all cases #81555

Conversation

blathers-crl bot commented May 19, 2022 • edited by yuzefovich Loading

blathers-crl bot commented May 19, 2022 • edited by yuzefovich Loading

cockroach-teamcity commented May 19, 2022

michae2 left a comment

Choose a reason for hiding this comment

blathers-crl bot commented May 19, 2022 •

edited by yuzefovich

Loading

blathers-crl bot commented May 19, 2022 •

edited by yuzefovich

Loading