Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
74491: colexecwindow: fix disk spilling in some cases r=yuzefovich a=yuzefovich **colserde: fix possible data corruption scenario during disk spilling** This commit fixes a possible data corruption (which would either result in a silent wrong query result or an internal error) scenario that could occur when the data is serialized/deserialized in the vectorized engine. This would occur when the deserialized vectors are appended to, and it was the most likely to occur with Bytes-like types (because their `Set`s can behave like appends to a certain degree). We need to deserialize the data in two paths - in the inbox after reading from network and during the disk spilling. I believe that the former is safe (since we don't modify those batches) and the latter is mostly safe (since we tend to not modify the batches that we read from disk). I think the only exception is the window functions. Consider the following scenario: a batch with two Bytes vectors is serialized. Say - the first vector is `{data:[foo], offsets:[0, 3]}` - the second vector is `{data:[bar], offsets:[0, 3]}`. After serializing both of them we will have a flat buffer with something like: `buf = {1foo031bar03}` (ones represent the lengths of each vector). Now, when the first vector is being deserialized, it's data slice will be something like: `data` = `[foo031bar03]`, `len(data) = 3`, `cap(data) > 3`. If we don't explicitly cap the slice and deserialize it into a Bytes vector, then later when we append to that vector, we will overwrite the data that is actually a part of the second serialized vector, thus, corrupting it (or the next batch). Release note (bug fix): Previously, CockroachDB could return incorrect results or internal errors on queries with window functions returning INT, FLOAT, BYTES, STRING, UUID, or JSON type when the disk spilling occurred. The bug was introduced in 21.2.0 and is now fixed. **colexecwindow: make bytes-like output vector valid before spilling** `bufferedWindowOp` is special in a sense that its output vector is appended by the `vectorTypeEnforcer` that is its input operator. That output vector, thus, is part of the input batch, and the vector is updated in incremental fashion (as the results are ready). It is also possible that the input batch needs to be spilled to disk for the operator to make progress. Previously, the output vector could be in an invalid state (if it was bytes-like) because less elements were set on the vector than the length of the batch. This is now fixed by making the output vector valid before spilling. Fixes: #70715. Release note: None (because the previous commit contains very similar info). **colexecwindow: fix min/max optimized window functions** This commit fixes a problem with MIN/MAX optimized window functions in the vectorized engine. The problem was that we forgot to make an explicit copy of the value before rewinding the spilling buffer (which can make the previously retrieved value invalid). Fixes: #74476. Release note (bug fix): CockroachDB could previously incorrectly calculate MIN/MAX when used as window functions in some cases after spilling to disk. The bug was introduced in 21.2.0 and is now fixed. **distsql: force disk spilling in TestWindowFunctionsAgainstProcessor** Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
- Loading branch information