bulk-io: CTAS creates more rows than expected when row count in millions #43752

mattcrdb · 2020-01-06T20:41:40Z

Describe the problem

Customer fielded a bug where a CTAS on 19.2.1 on a table with 35 million rows is reporting 10's of million rows more than expected.

They are not using it in a TXN and therefore bulk-io is used to write the table.

To Reproduce

Create a local 3 node cluster
generate dummy data with 35m rows (I used a python script).

#!/usr/bin/env python3

import uuid
import csv

with open('data.csv', mode='w') as data_file:
    data_writer = csv.writer(data_file, delimiter=',', quotechar='"',quoting=csv.QUOTE_MINIMAL)
    data_writer.writerow(['id','data'])
    for i in range (35000000):
        data_writer.writerow([uuid.uuid4(),'Testrowdata'])

create extern directories in each local node store
copy data.csv to each respective extern folder
in the sql shell:

root@:26257/defaultdb> import table test_table (
id string primary key,
data string)
csv data('nodelocal:///data.csv');
        job_id       |  status   | fraction_completed |   rows   | index_entries | system_records |   bytes
+--------------------+-----------+--------------------+----------+---------------+----------------+------------+
  518623560092418049 | succeeded |                  1 | 35000001 |             0 |              0 | 2100000019
(1 row)

Time: 5m23.902097s

root@:26257/defaultdb> select count(1) from test_table;
   count
+----------+
  35000001
(1 row)

Time: 8.174131s

root@:26257/defaultdb> create table test_table2 as select * from test_table;
CREATE TABLE AS

Time: 1m24.952231s

root@:26257/defaultdb> select count(1) from test_table2;
   count
+----------+
  42190239
(1 row)

Time: 19.979619s

root@:26257/defaultdb> select count(1) from test_table2;
   count
+----------+
  70000002
(1 row)

Time: 16.955686s

Expected behavior
A table with the exact amount of rows and data to be created.

Additional data / screenshots
possibly related to #28842

the bug does not occur when specifying a primary key

root@:26257/defaultdb> select count(1) from test_table;
   count
+----------+
  35000001
(1 row)

Time: 13.908131s

root@:26257/defaultdb> create table test_table3(id primary key,data) as select id,data from test_table;
CREATE TABLE AS

Time: 1m3.381818s

root@:26257/defaultdb> select count(1) from test_table;
   count
+----------+
  35000001
(1 row)

Time: 7.996564s

root@:26257/defaultdb> select count(1) from test_table3;
   count
+----------+
  35000001
(1 row)

Time: 13.764223s

root@:26257/defaultdb> select count(1) from test_table3;
   count
+----------+
  35000001
(1 row)

Time: 17.487245s

If applicable, add screenshots to help explain your problem.

Environment:

CockroachDB version 19.2.1
Server OS: MacOS
Client app: Cockroach SQL

RoachietheSupportRoach · 2020-01-06T20:48:22Z

Zendesk ticket #4303 has been linked to this issue.

dt · 2020-01-08T22:27:44Z

Think I've figured out what is going on here.

TL;DR: more than one node is trying to backfill the new table at the same time.

In the schema changer, in the exec method, we check for and trigger a backfill of the new table here: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/schema_changer.go#L935

That's right before we then move tables that are being added into the public state (usually only needed for FKs, but also relied on by CREATE TABLE ... AS tables that use that same state.

However, the exclusive execution protection mechanism -- the schema-change lease that lives in tableDesc.Lease -- isn't acquired until much further down, here: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/schema_changer.go#L975

We could potentially move the backfill down, inside the exclusive execution region and accordingly modify the existing make-public check to skip AS tables and add a later publish step, or we could also include a lease-acquisition inside maybeBackfillAs.

What I'm not sure of though why the lease isn't acquired much earlier -- to perform any of the earlier steps like make gc mutations and make public? Also, the tableDesc.Lease isn't really trusted -- in general, exec doesn't do anything to ensure it is maintained, so longer schema changes are left to their own devices to maintain it (and often don't) so we generally have seen concurrent execution and I think we've said that schema changes have to be okay with that? If so, then we'd need some other locking for CTAS backfill, because it is decidedly not okay with concurrent execution.

dt · 2020-03-06T14:33:26Z

#44300

mattcrdb assigned dt Jan 6, 2020

tim-o added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 6, 2020

tim-o added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Jan 6, 2020

dt mentioned this issue Jan 8, 2020

sql: clear target table before backfilling CREATE…AS #43780

Closed

dt closed this as completed Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk-io: CTAS creates more rows than expected when row count in millions #43752

bulk-io: CTAS creates more rows than expected when row count in millions #43752

mattcrdb commented Jan 6, 2020 •

edited

Loading

RoachietheSupportRoach commented Jan 6, 2020

dt commented Jan 8, 2020 •

edited

Loading

dt commented Mar 6, 2020

bulk-io: CTAS creates more rows than expected when row count in millions #43752

bulk-io: CTAS creates more rows than expected when row count in millions #43752

Comments

mattcrdb commented Jan 6, 2020 • edited Loading

RoachietheSupportRoach commented Jan 6, 2020

dt commented Jan 8, 2020 • edited Loading

dt commented Mar 6, 2020

mattcrdb commented Jan 6, 2020 •

edited

Loading

dt commented Jan 8, 2020 •

edited

Loading