-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colfetcher: incorrect decoding of unique secondary indexes with multiple column families #66706
Comments
roachtest.tlp failed with artifacts on master @ eb7f3fe2991992472a68d701d65543f7a9c7fb56:
Reproduce
To reproduce, try: # From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tlp |
The last failure looks like a bug in the TLP generator. It should not be generating non-immutable predicates with |
This is fixed by #67194. |
Here is the log of SQL statements that resulted in the TLP failure: https://gist.github.com/mgartner/c4a99f034fefd97c55537a87cc5f89f1 |
Well... this is interesting:
|
Here's a partially minimized repro: https://gist.github.com/mgartner/3515443d50270d5d4232ce5753c2cb1b |
I managed to minimize this quite a bit. The following test will fail after a few rounds of stressing. It fails with both the computed column set as
|
Looks like it fails only for the vectorized engine. |
Another clue: it only happens with more than 1 FAMILY. The non-deterministic nature of the failure comes from the test apparatus, which is good. This test fails on every run:
|
EDIT: This problem exists on 20.1, 20.2, and 21.1. It does not exist on 19.2. |
Here's a simpler reproduction:
|
Some relevant code pointers: This block is not executed although it should be because cockroach/pkg/sql/colfetcher/cfetcher.go Lines 956 to 973 in e42fe76
cockroach/pkg/sql/colfetcher/cfetcher.go Lines 896 to 900 in e42fe76
The fix might be to require decoding of nullable columns indexed in unique secondary indexes by changing the logic that sets cockroach/pkg/sql/colfetcher/cfetcher.go Lines 496 to 505 in e42fe76
This commit may be related too: 1bcc7b8 |
Are we for sure fetching these nullable columns? As in, they're encoded in the byte slice, but we just skip past them during decoding? |
Yeah. I think this diff is what we want to do:
Checking whether we have a NULL value without fully decoding seems very cheap, so I'm thinking it's probably not worth plumbing the flag about unique secondary indexes with multiple column families to see whether we should be checking for NULL. |
I don't think we need to plumb a flag, we just need to change the logic where |
I don't think that's the best solution because that would make us fully decode those nullable columns which is an overkill in this case (unless I'm misunderstanding what |
Great point. On the other hand, your proposal would make us perform unnecessary null-decoding checks for unneeded columns in non-unique secondary indexes. I don't yet have intuition about which would be worse. |
I think it would happen for the unneeded columns from the primary indexes too. I guess plumbing the flag plus the diff above might be the safest option. |
roachtest.tlp failed with artifacts on master @ 7df66cb1840c263270bd2b1a690c8e9a7c025333:
Reproduce
To reproduce, try: # From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tlp |
`colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes cockroachdb#66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1.
I've confirmed that the last failure report is the same bug. |
68071: colfetcher: fix NULL checks during unique index decoding r=mgartner a=mgartner `colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes #66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1. Co-authored-by: Marcus Gartner <[email protected]>
`colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes cockroachdb#66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1.
`colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes cockroachdb#66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1.
`colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes cockroachdb#66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1.
`colfetcher` must detect `NULL` values in unique secondary index keys on tables with multiple column families in order to determine whether consecutive KVs belongs to the same row or different rows. Previously, only the columns in the key that were needed by the query were decoded and checked for `NULL`. This caused incorrect query results when `NULL` column values were not detected because those columns were not needed. This commit fixes the issue by checking all columns for `NULL` when decoding unique secondary index keys on tables with multiple column families. Fixes cockroachdb#66706 Release note (bug fix): A bug has been fix that caused incorrect query results when querying tables with multiple column families and unique secondary indexes. The bug only occurred if 1) vectorized execution was enabled for the query, 2) the query scanned a unique secondary index that contained columns from more than one column family, and 3) rows fetched by the query contained NULL values for some of the indexed columns. This bug was present since version 20.1.
roachtest.tlp failed with artifacts on master @ 1b686aef9949c1c7ef930b55bd1fbc0ed2e8268a:
Reproduce
To reproduce, try:
# From https://go.crdb.dev/p/roachstress, perhaps edited lightly. caffeinate ./roachstress.sh tlp
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: