Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ON CONFLICT leads to unexpected error from the vectorized engine: runtime error: index out of range [4] with length 4 #74795

Closed
dimpavloff opened this issue Jan 13, 2022 · 9 comments · Fixed by #74825
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner

Comments

@dimpavloff
Copy link

dimpavloff commented Jan 13, 2022

Describe the problem

CRDB returns an internal error with the below panic stack trace:

ERROR: internal error: unexpected error from the vectorized engine: runtime error: index out of range [4] with length 4
SQLSTATE: XX000                                                                                                                                                                       
DETAIL: stack trace:                                                                              
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:88: func1()
/usr/local/go/src/runtime/panic.go:965: gopanic()                                   
/usr/local/go/src/runtime/panic.go:88: goPanicIndex()                                      
/go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/bytes.go:131: Get()                            
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable_distinct.eg.go:6248: checkColForDistinctTuples()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable.go:531: checkColsForDistinctTuples()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable.go:635: CheckBuildForDistinct()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable.go:456: FindBuckets()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable.go:471: RemoveDuplicates()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable.go:383: DistinctBuild()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/unordered_distinct.go:105: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/disk_spiller.go:197: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91: CatchVectorizedRuntimeError()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/disk_spiller.go:195: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils/operator.go:137: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/is_null_ops.eg.go:67: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils/operator.go:137: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/is_null_ops.eg.go:67: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecbase/simple_project.go:125: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/stats.go:98: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:261: next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:286: nextAdapter()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91: CatchVectorizedRuntimeError()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:292: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/row_source_to_plan_node.go:76: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/insert.go:221: BatchedNext()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_batch.go:173: startExec()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan.go:509: func2()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:112: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:289: visitInternal()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:79: visit()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:43: walkPlan()

It seems that this happens only when inserting very specific data into the table; many variations of the data inserted result in successful query execution.
It also seems that removing the ON CONFLICT DO NOTHING clause or the index result in not hitting the problem.
The bug is NOT present on 21.1.13 but exists in 21.2.0-4

To Reproduce

  1. cockroach start-single-node --insecure --store=type=mem,size=1GB
  2. run the provided query (e.g. cat bug.sql | cockroach sql --url postgresql://root@localhost:26257/defaultdb?sslmode=disable )

Expected behavior
The query succeeds

Additional data / screenshots

cat > bug.sql <<EOF
BEGIN;                                                
CREATE SCHEMA auth;                          
CREATE TABLE auth.mytable (                 
    id TEXT PRIMARY KEY DEFAULT gen_random_uuid()::TEXT,
    account_id TEXT NOT NULL,                       
    permission TEXT NOT NULL,                  
    creation_timestamp TIMESTAMP DEFAULT (NOW() AT TIME ZONE 'UTC') NOT NULL,
    creation_request_id TEXT UNIQUE NOT NULL,                        
    deletion_timestamp TIMESTAMP,
    deletion_request_id TEXT UNIQUE
);       
CREATE UNIQUE INDEX account_id_permission_deletion_timestamp ON auth.mytable USING BTREE (account_id, permission, deletion_request_id) ;
COMMIT;        
                                                
BEGIN;   
INSERT INTO                                                                                                                               
auth.mytable                                                                                                            
(account_id, permission, creation_request_id)                                                                         
VALUES                                                                                                                              
('1332952395110132214', '/access_control_api.*', '1'),                                                                                
('1332952395110132214', '/audit_api.*', '2'),                                                                               
('1332952395110132214', '/core_api.*', '3'),                                                                                  
('1332952395110132214', '/data_loader_api.*', '4'),                                                                             
('1332952395110132214', '/multi_entity_api.*', '5'),                                                                
('1332952395110132214', '/payment_hub.*', '6'),                                                                                 
('1332952395110132214', '/products_api.*', '7')                                                                            
ON CONFLICT (account_id, permission, deletion_request_id) DO NOTHING;
COMMIT;
EOF

Note that even minor changes in the values above for the first or second column result in the query succeeding.

Environment:

  • CockroachDB version 21.2.0 - 21.2.4
  • Server OS: Linux / Ubuntu
  • Client app cockroach sql , psql

I am new to the CockroachDB codebase but if this is a simple bug for which you could give me some pointers, I would be interested in fixing.

@dimpavloff dimpavloff added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 13, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jan 13, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-queries (found keywords: vectorized,plan)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jan 13, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jan 13, 2022
@mgartner
Copy link
Collaborator

Thanks for the detailed report! I've managed to reduce the reproduction a bit:

CREATE TABLE mytable (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id TEXT NOT NULL,
    deletion_request_id TEXT,
    UNIQUE INDEX (account_id, deletion_request_id)
);

INSERT INTO mytable
    (account_id)
VALUES
    ('foo'),
    ('foo'),
    ('foo')
ON CONFLICT (account_id, deletion_request_id) DO NOTHING;

The query plan for the INSERT:

EXPLAIN (OPT, VERBOSE)
INSERT INTO mytable
    (account_id)
VALUES
    ('foo'),
    ('foo'),
    ('foo')
ON CONFLICT (account_id, deletion_request_id) DO NOTHING;
----
insert mytable
 ├── columns: <none>
 ├── arbiter indexes: mytable_account_id_deletion_request_id_key
 ├── insert-mapping:
 │    ├── id_default:7 => id:1
 │    ├── column1:6 => account_id:2
 │    └── deletion_request_id_default:8 => deletion_request_id:3
 ├── cardinality: [0 - 0]
 ├── volatile, mutations
 ├── stats: [rows=0]
 ├── cost: 0.3
 └── upsert-distinct-on
      ├── columns: column1:6 id_default:7 deletion_request_id_default:8
      ├── grouping columns: column1:6 deletion_request_id_default:8
      ├── cardinality: [1 - 3]
      ├── volatile
      ├── stats: [rows=1, distinct(6,8)=1, null(6,8)=0]
      ├── cost: 0.29
      ├── lax-key: (6,8)
      ├── fd: ()-->(8), (6,8)~~>(7)
      ├── project
      │    ├── columns: id_default:7 deletion_request_id_default:8 column1:6
      │    ├── cardinality: [3 - 3]
      │    ├── volatile
      │    ├── stats: [rows=3, distinct(6,8)=1, null(6,8)=0]
      │    ├── cost: 0.14
      │    ├── fd: ()-->(8)
      │    ├── values
      │    │    ├── columns: column1:6
      │    │    ├── cardinality: [3 - 3]
      │    │    ├── stats: [rows=3, distinct(6)=1, null(6)=0]
      │    │    ├── cost: 0.04
      │    │    ├── prune: (6)
      │    │    ├── ('foo',)
      │    │    ├── ('foo',)
      │    │    └── ('foo',)
      │    └── projections
      │         ├── gen_random_uuid() [as=id_default:7, volatile]
      │         └── CAST(NULL AS STRING) [as=deletion_request_id_default:8]
      └── aggregations
           └── first-agg [as=id_default:7, outer=(7)]
                └── id_default:7

I am new to the CockroachDB codebase but if this is a simple bug for which you could give me some pointers, I would be interested in fixing.

It looks like the stack trace points to the vectorized execution engine. I can't link to the file or line where the error occurs because the file, pkg/sql/colexec/colexechash/hashtable_distinct.eg.go, is a generated file that is not checked into source code, but if you've built CRDB from source it should exist.

The relevant lines look like:

// keyID of 0 is reserved to represent the end of the next chain.
keyID := ht.ProbeScratch.GroupID[toCheck]
if keyID != 0 {

  // ...
  
  buildIdx = int(keyID - 1)

  // ...


  // Error occurs here:
  buildVal := buildKeys.Get(buildIdx)

  // ...

I don't yet know exactly what invariants are being broken that are causing buildIdx to be out of bounds, but you're certainly welcome to investigate further. Be aware, though, that someone on our side with expertise in the execution engine may come up with a fix first - the bug seems fairly severe and may require an urgent fix.


As an aside, I noticed in your auth.mytable that the id is type TEXT and not UUID, with a default expression of gen_random_uuid()::TEXT. I'd advise against this because a UUID requires 16 bytes while the string representation of a UUID is 36 bytes. With UUID instead of TEXT, the size of tables on disk should be smaller and the query performance should be better.

@mgartner
Copy link
Collaborator

I was incorrect. The generated .eg.go files are checked in. The error originates here:

buildVal := buildKeys.Get(buildIdx)

@dimpavloff
Copy link
Author

dimpavloff commented Jan 13, 2022

Thanks for the lightning response!

I can't link to the file or line where the error occurs because the file, pkg/sql/colexec/colexechash/hashtable_distinct.eg.go, is a generated file that is not checked into source code

If people have access to GH's new code search, the file can be seen here .

Not sure if useful but here's the dlv output with a breakpoint on /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash/hashtable_distinct.eg.go:6248 on the offending iteration :

(dlv) args
ht = ("*github.com/cockroachdb/cockroach/pkg/sql/colexec/colexechash.HashTable")(0xc001745900)
probeVec = github.com/cockroachdb/cockroach/pkg/col/coldata.Vec(*github.com/cockroachdb/cockroach/pkg/col/coldata.memColumn) 0xbeef000000000008
buildVec = github.com/cockroachdb/cockroach/pkg/col/coldata.Vec(*github.com/cockroachdb/cockroach/pkg/col/coldata.memColumn) 0xbeef000000000108
nToCheck = 2
probeSel = []int len: 2, cap: 0, [...]
(dlv) p probeVec
github.com/cockroachdb/cockroach/pkg/col/coldata.Vec(*github.com/cockroachdb/cockroach/pkg/col/coldata.memColumn) *{
        t: *github.com/cockroachdb/cockroach/pkg/sql/types.T {
                InternalType: (*"github.com/cockroachdb/cockroach/pkg/sql/types.InternalType")(0xacfc960),
                TypeMeta: (*"github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeMetadata")(0xacfca00),},
        canonicalTypeFamily: BytesFamily (8),
        col: github.com/cockroachdb/cockroach/pkg/col/coldata.Column(*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes) *{
                data: []uint8 len: 6, cap: 128, [102,111,111,102,111,111],
                offsets: []int32 len: 3, cap: 3, [0,3,6],
                maxSetLength: 2,
                isWindow: false,},
        nulls: github.com/cockroachdb/cockroach/pkg/col/coldata.Nulls {
                nulls: []uint8 len: 1, cap: 1, [255],
                maybeHasNulls: false,},}
(dlv) p buildVec
github.com/cockroachdb/cockroach/pkg/col/coldata.Vec(*github.com/cockroachdb/cockroach/pkg/col/coldata.memColumn) *{
        t: *github.com/cockroachdb/cockroach/pkg/sql/types.T {
                InternalType: (*"github.com/cockroachdb/cockroach/pkg/sql/types.InternalType")(0xacfc960),
                TypeMeta: (*"github.com/cockroachdb/cockroach/pkg/sql/types.UserDefinedTypeMetadata")(0xacfca00),},
        canonicalTypeFamily: BytesFamily (8),
        col: github.com/cockroachdb/cockroach/pkg/col/coldata.Column(*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes) *{
                data: []uint8 len: 3, cap: 8, [102,111,111],
                offsets: []int32 len: 2, cap: 2, [0,3],
                maxSetLength: 1,
                isWindow: false,},
        nulls: github.com/cockroachdb/cockroach/pkg/col/coldata.Nulls {
                nulls: []uint8 len: 1, cap: 8, [255],
                maybeHasNulls: false,},}
(dlv) p probeSel
[]int len: 2, cap: 0, [0,1]
(dlv) locals
probeKeys = ("*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes")(0xc0017760c0)
buildKeys = ("*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes")(0xc00070f200)
probeIdx = 1
buildIdx = (unreadable could not find loclist entry at 0x24d59a6 for address 0x23a3d53)
toCheck = (unreadable could not find loclist entry at 0x24d59d9 for address 0x23a3d53)
keyID = 2

(dlv) p ht.ProbeScratch.GroupID
[]uint64 len: 2, cap: 2, [1,2]
(dlv) p ht.ProbeScratch.ToCheck
[]uint64 len: 2, cap: 2, [0,1]

(dlv) p probeKeys
*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes {
        data: []uint8 len: 6, cap: 128, [102,111,111,102,111,111],
        offsets: []int32 len: 3, cap: 3, [0,3,6],
        maxSetLength: 2,
        isWindow: false,}
(dlv) p buildKeys
*github.com/cockroachdb/cockroach/pkg/col/coldata.Bytes {
        data: []uint8 len: 3, cap: 8, [102,111,111],
        offsets: []int32 len: 2, cap: 2, [0,3],
        maxSetLength: 1,
        isWindow: false,}

I don't yet know exactly what invariants are being broken that are causing buildIdx to be out of bounds, but you're certainly welcome to investigate further. Be aware, though, that someone on our side with expertise in the execution engine may come up with a fix first - the bug seems fairly severe and may require an urgent fix.

No problem at all. When I noticed that the file was autogenerated, I thought it would be unlikely to be a simple code change :)

@mgartner
Copy link
Collaborator

Looks like the bug was introduced in d1cbf20.

@mgartner
Copy link
Collaborator

It seems like buildVec.MaybeHasNulls() should be returning true so we go into the first branch of the if/else. buildVec has values 'foo' and NULL, but for some reason buildVec.nulls incorrectly indicates that there are no null values.

@mgartner
Copy link
Collaborator

Hmm, my last comment seems incorrect. I now think the issue is that ht.ProbeScratch.ToCheck has length 2 when it should have length 1.

@yuzefovich
Copy link
Member

@dimpavloff we've identified the bug and have a fix that should be merged shortly. It'll be backported to 21.2 release branch and should be included in 21.2.5 (target release date January 31). In the meantime as a workaround you could do SET vectorize=off; <query>; RESET vectorize;.

@dimpavloff
Copy link
Author

Great, thank you both for the quick resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants