-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-21.2: colflow: release disk resources in hash router in all cases #81555
Conversation
Previously, it was possible for the disk-backed spilling queue used by the hash router outputs to not be closed when the hash router exited. Namely, this could occur if the router output was not fully exhausted (i.e. it could still produce more batches, but the consumer of the router output was satisfied and called `DrainMeta`). In such a scenario, `routerOutput.closeLocked` was never called because a zero-length batch was never given to `addBatch` nor the output was canceled due to an error. The flow cleanup also didn't save us because the router outputs are not added into `ToClose` slice. The bug is now fixed by closing the router output in `DrainMeta`. This behavior is acceptable because the caller is not interested in any more data, and closing the output can be done multiple times (it is a no-op on all calls except for the first one). There is no regression test since it's quite tricky to come up with given that the behavior of router outputs is non-deterministic, and I don't think it's worth introducing special knobs inside of `DrainMeta` / `Next` for this. The impact of not closing the spilling queue is that it might lead to leaking a file descriptor until the node restarts. Although the temporary directory is deleted on the flow cleanup, the bug would result in a leak of the disk space which is also "fixed" by the node restarts. Release note: None
131794d
to
39944c3
Compare
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)
Backport 1/1 commits from #81491 on behalf of @yuzefovich.
/cc @cockroachdb/release
Previously, it was possible for the disk-backed spilling queue used
by the hash router outputs to not be closed when the hash router exited.
Namely, this could occur if the router output was not fully exhausted
(i.e. it could still produce more batches, but the consumer of the
router output was satisfied and called
DrainMeta
). In such a scenario,routerOutput.closeLocked
was never called because a zero-length batchwas never given to
addBatch
nor the output was canceled due to anerror. The flow cleanup also didn't save us because the router outputs
are not added into
ToClose
slice.The bug is now fixed by closing the router output in
DrainMeta
. Thisbehavior is acceptable because the caller is not interested in any more
data, and closing the output can be done multiple times (it is a no-op
on all calls except for the first one). There is no regression test
since it's quite tricky to come up with given that the behavior of
router outputs is non-deterministic, and I don't think it's worth
introducing special knobs inside of
DrainMeta
/Next
for this.The impact of not closing the spilling queue is that it might lead to
leaking a file descriptor until the node restarts. Although the
temporary directory is deleted on the flow cleanup, the bug would result
in a leak of the disk space which is also "fixed" by the node restarts.
Fixes: #81490.
Release note: None
Release justification: bug fix.