Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: unexpected duplicate key violations #7604

Closed
tamird opened this issue Jul 2, 2016 · 2 comments
Closed

sql: unexpected duplicate key violations #7604

tamird opened this issue Jul 2, 2016 · 2 comments
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone

Comments

@tamird
Copy link
Contributor

tamird commented Jul 2, 2016

(originally reported by @aquarat in #6053; I've deleted the original comment because it doesn't seem related to the original issue).

What version of CockroachDB are you using (cockroach version)?

$ cockroach version
Build Tag:   beta-20160519
Build Time:  2016/05/19 21:52:10
Platform:    linux amd64
Go Version:  go1.6
C Compiler:  gcc 4.9.2
(both nodes - using the same executable)

What operating system and processor architecture are you using?

Both nodes : Ubuntu 15.10 "Wily"

Node 1 : (local) (received most if not all writes)
Linux sqlrat 3.19.0-43-generic #49-Ubuntu SMP Sun Dec 27 19:43:07 UTC 2015 x86_64 x86_64 
x86_64 GNU/Linux 

Node 2 : (remote) (saw few/no writes)
Linux parirat 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 
x86_64 GNU/Linux

What flags/environment variables did you pass to cockroach start?

Node 1
$ ./cockroach start --insecure --port=25267 --join=127.0.0.1:25268 --alsologtostderr
Node 2
$ ./cockroach start --join=127.0.0.1:25267 --port=25268 --insecure --alsologtostderr --http-port=65530

Nodes are linked via SSH tunnelling (they're on different machines; one in France, the other in South Africa). Both machines are NTP synchronised.

ssh -NCL 25268:127.0.0.1:25268 -R 25267:127.0.0.1:25267 user@host -p 60022 -i mykey

Please describe the issue you observed:

2016/05/23 19:57:57 pq: context deadline exceeded
2016/05/23 20:00:24 pq: duplicate key value (rowid)=(144035022173011969) violates unique constraint "primary"
2016/05/23 20:00:24 pq: duplicate key value (rowid)=(144035021942947841) violates unique constraint "primary"

Initially the message "context deadline exceeded" was repeated many hundreds of times, a failure rate 
of around 1% across ~ 1 million records, towards the end of the "copy" process the second type of 
message was repeated hundreds of times.

What did you do?

I created and ran a Go application with the intention of using it to duplicate a table from
a Postgres database to a local CockroachDB node by selecting from the Postgres instance 
and inserting received records into the CockroachDB node. Initially this application was 
single-threaded but CockroachDB responded slowly to inserts, so I modified the 
application to execute inserts concurrently using a set of goroutines. I increased the number
of goroutine workers until I started to see more than 20% CPU usage on the CockroachDB
instance. My code is linked below... it was a very quick and dirty creation, apologies in advance. 
CockroachDB indicated 107 connections during the "copy" process. During this time no
write transactions were executed on the second node, but the occasional 
SELECT was executed, usually SELECT COUNT and SELECT * ... LIMIT 1;

Application performing the copy : https://play.golang.org/p/hhGzhNtmHL

The table structure in CockroachDB :
CREATE TABLE log (id INT, data STRING, entered TIMESTAMP, channel STRING);
(implying the creation of the rowid column)

What did you expect to see?

Limited or no error messages; I was copying an existing table and the table structure
in CockroachDB I was inserting into had a managed primary key (hidden rowid), so I
wasn't expecting any key conflicts.

What did you see instead?

Initially the odd "context deadline exceeded" and then towards the end of the process
(after about 600k rows were inserted)  errors indicating a duplicate key violation on the
rowid column. Both messages were repeated many times (in excess of 1000).
@petermattis petermattis modified the milestone: Q3 Jul 11, 2016
@tamird tamird added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Sep 1, 2016
@tbg
Copy link
Member

tbg commented Sep 9, 2016

From gamma-10. These are likely the result of retrying an operation that actually executed the first time (which has been logged b. There's a steady stream of those in the logs. Throughput is erratic, so this might be exacerbated by an underlying issue.

,block_num)=(5476931981080905971,'7abff5d0-7c8a-42a4-b802-3ee528d0eb9e',1843702) violates unique constraint "primary"
2016/09/09 20:24:55 error running blockwriter 12ad942e-ebc2-4da3-a270-1353666f1222: pq: duplicate key value (block_id,writer_id,block_num)=(2835414357791403270,'12ad942e-ebc2-4da3-a270-1353666f1222',1650680) violates unique constraint "primary"
2016/09/09 20:24:58 error running blockwriter 7abff5d0-7c8a-42a4-b802-3ee528d0eb9e: pq: context deadline exceeded
2016/09/09 20:25:00 error running blockwriter ab6bb027-e1c1-41f2-8411-053b5162d916: pq: duplicate key value (block_id,writer_id,block_num)=(3197035086659224716,'ab6bb027-e1c1-41f2-8411-053b5162d916',1668242) violates unique constraint "primary"
2016/09/09 20:25:00 error running blockwriter 6e092dc3-d7d9-4519-8b81-fe2f89766e47: pq: duplicate key value (block_id,writer_id,block_num)=(5046786100734609289,'6e092dc3-d7d9-4519-8b81-fe2f89766e47',1673754) violates unique constraint "primary"
2016/09/09 20:25:02 error running blockwriter 970ef882-b4f9-420d-98a6-66ebec579936: pq: duplicate key value (block_id,writer_id,block_num)=(5383691232940138385,'970ef882-b4f9-420d-98a6-66ebec579936',1686702) violates unique constraint "primary"
2016/09/09 20:25:15 error running blockwriter 12ad942e-ebc2-4da3-a270-1353666f1222: pq: duplicate key value (block_id,writer_id,block_num)=(6528984344439959498,'12ad942e-ebc2-4da3-a270-1353666f1222',1650769) violates unique constraint "primary"
2016/09/09 20:25:22 error running blockwriter 6e092dc3-d7d9-4519-8b81-fe2f89766e47: pq: duplicate key value (block_id,writer_id,block_num)=(501596765062273046,'6e092dc3-d7d9-4519-8b81-fe2f89766e47',1673855) violates unique constraint "primary"
2016/09/09 20:25:35 error running blockwriter 7abff5d0-7c8a-42a4-b802-3ee528d0eb9e: pq: duplicate key value (block_id,writer_id,block_num)=(8095020839265014353,'7abff5d0-7c8a-42a4-b802-3ee528d0eb9e',1844028) violates unique constraint "primary"
2016/09/09 20:25:38 error running blockwriter 12ad942e-ebc2-4da3-a270-1353666f1222: pq: context deadline exceeded
2016/09/09 20:25:38 error running blockwriter 7abff5d0-7c8a-42a4-b802-3ee528d0eb9e: pq: duplicate key value (block_id,writer_id,block_num)=(6838873562610097195,'7abff5d0-7c8a-42a4-b802-3ee528d0eb9e',1844030) violates unique constraint "primary"
2016/09/09 20:25:57 error running blockwriter 6e092dc3-d7d9-4519-8b81-fe2f89766e47: pq: duplicate key value (block_id,writer_id,block_num)=(8829626262417592569,'6e092dc3-d7d9-4519-8b81-fe2f89766e47',1673950) violates unique constraint "primary"
2016/09/09 20:26:19 error running blockwriter ab6bb027-e1c1-41f2-8411-053b5162d916: pq: duplicate key value (block_id,writer_id,block_num)=(2331127011208377399,'ab6bb027-e1c1-41f2-8411-053b5162d916',1668748) violates unique constraint "primary"
2016/09/09 20:26:19 error running blockwriter 7abff5d0-7c8a-42a4-b802-3ee528d0eb9e: pq: duplicate key value (block_id,writer_id,block_num)=(795059875892963747,'7abff5d0-7c8a-42a4-b802-3ee528d0eb9e',1844333) violates unique constraint "primary"
2016/09/09 20:26:23 error running blockwriter 6e092dc3-d7d9-4519-8b81-fe2f89766e47: pq: duplicate key value (block_id,writer_id,block_num)=(7529456395356892365,'6e092dc3-d7d9-4519-8b81-fe2f89766e47',1674186) violates unique constraint "primary"

spencerkimball added a commit that referenced this issue Oct 23, 2016
Introduce `AmbiguousCommitError` in the event that a batch with an
`EndTransaction` request is sent but the response is not available.

Fixes #6053, #7604, and #10023
spencerkimball added a commit that referenced this issue Oct 24, 2016
Introduce `AmbiguousCommitError` in the event that a batch with an
`EndTransaction` request is sent but the response is not available.

Fixes #6053, #7604, and #10023
@tamird
Copy link
Contributor Author

tamird commented Oct 26, 2016

Closed by #10207.

@tamird tamird closed this as completed Oct 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

3 participants