execinfra: uncertainty error can be incorrectly swallowed during a distributed join stage #51458
Labels
A-sql-execution
Relating to SQL execution.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Read uncertainty errors are sometimes swallowed if a limit is hit. This makes it so that a client does not have to retry if the results already satisfy the query (e.g. a limit). This issue is about this swallowing behavior occurring when it shouldn't be (i.e. returning incorrect results to the client).
A customer ran into an issue that I was able to deterministically reproduce in #51375. I think I understand what's going on. Here is the interesting part of the plan for reference:
TableReader/6
is the one that emits the uncertainty error. This is then propagated downstream as expected. Note that the hash router sends the error to only one output. In this case the chain looks likeTableReader/6
->MergeJoiner/7
->HashJoiner/9
->HashJoiner/13
. Everything works fine until we get toHashJoiner/13
which short circuits because its right side,TableReader/12
, is empty and then swallows the uncertainty error on the next iteration. This is technically valid because we're doing an inner join.However, the interesting part is that
HashJoiner/13
belongs to a join stage that includesHashJoiner/15
andHashJoiner/14
.HashJoiner/15
short circuits because its right side is empty, which again, is valid. The crucial part of this bug is thatHashJoiner/14
short-circuits because its left side is empty. Its left side includesHashJoiner/9
, which is the last time the uncertainty error was correctly propagated. This is invalid becauseHashJoiner/9
correctly invalidated the results pushed toHashJoiner/13
, but does not do the same withHashJoiner/14
orHashJoiner/15
because of how an error is propagated with the hash router. I think there are several ways to fix this:HashJoiner
drains the non-empty side and propagates any metadata without swallowing it before moving to draining. The disadvantage of this approach is that it is localized to short-circuiting behavior (not that I know of this issue anywhere else) and having to fully drain the non-empty side, which might defeat the point of short-circuiting in the first place.hashRouter
), error metadata must be copied and sent to all streams so that it is valid for processors to make the local decision of whether or not to swallow errors. The advantage is that this is a more general solution, the disadvantage is that I don't know if sending multiple errors has side effects. We could also make this behavior specific to uncertainty erros.ProcessorBase
, and make it a more "global"/specific decision. I'm not sure exactly what this would look like because we can't move this logic to a receiver (it doesn't know when a limit has been hit), and might mean that we need to move it to specific processors (e.g. limit, anything else?).The text was updated successfully, but these errors were encountered: