Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: manual testing of MVCC bulk ops #89276

Closed
erikgrinaker opened this issue Oct 4, 2022 · 2 comments
Closed

storage: manual testing of MVCC bulk ops #89276

erikgrinaker opened this issue Oct 4, 2022 · 2 comments
Assignees
Labels

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 4, 2022

In lieu of proper end-to-end testing, we should do some manual testing of the new MVCC bulk ops work.

Enable relevant settings:

  • storage.mvcc.range_tombstones.enabled = true
  • kv.bulk_io_write.sst_rewrite_concurrency.per_call = 4
  • kv.bulk_io_write.sst_require_at_request_timestamp.enabled = true
  • gc.ttlseconds = 1800
  • sql.gc_job.wait_for_gc.interval = '1m'
  • Also test that disabling MVCC range tombstones and waiting for them to be GCed will no longer produce them.
  • Also ensure that we get a reasonable amount of splits/merges during testing.

First, test import cancellation into empty table:

  • Create a table.
  • Start a changefeed.
  • Take a backup.
  • Import initial data. Cancel+retry a few times, then complete.
  • Run some queries and verify them (e.g. SELECT *, SELECT COUNT(*), SELECT * WHERE).
  • Resume the changefeed, before/after the import timestamp.
  • Take a backup, full+incremental.
  • Run a consistency check. Verify that MVCC stats are correct.

Next, test import cancellation into table with data:

  • Import additional data. Cancel+retry a few times, then complete.
  • Run some queries and verify them.
  • Resume the changefeed, before/after the import timestamp.
  • Take a backup, full+incremental.
  • Run a consistency check. Verify that MVCC stats are correct.

Test schema changes where source has MVCC range tombstones:

  • Build an index from the table. Verify that it's correct.
  • Run a schema change on the table. Verify that it's correct.

Test backups:

  • Restore the backups into separate tables.
  • Verify that final restore matches original table.
  • Verify that full/incremental backups match.

Test GC:

  • Wait for GC. All range keys and garbage should be removed.
  • Run a consistency check. Verify that MVCC stats are correct.
  • Drop the table. Wait for it to be deleted.
  • Take an incremental backup (if possible?).
  • Run a consistency check. Verify that MVCC stats are correct.
  • Wait for schema GC to remove range contents.
  • Run a consistency check. Verify that MVCC stats are correct.

Test PITR restores:

  • Do point-in-time restores of the incremental backup at each backup time.
  • Verify that they match the full backups at those times.

Jira issue: CRDB-20185

Epic CRDB-2624

@blathers-crl
Copy link

blathers-crl bot commented Oct 4, 2022

cc @cockroachdb/replication

@erikgrinaker erikgrinaker changed the title storage: Cyan testing of MVCC bulk ops storage: manual testing of MVCC bulk ops Oct 23, 2022
@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Nov 5, 2022

Ran a bunch of manual tests. Did not do exhaustive backup/restore tests, since these are covered by #89761. No serious issues were found, only these:

Rough notes, for posterity:

roachprod create -n 6 --gce-machine-type n2-standard-16 --gce-zones europe-west1-b --lifetime 168h grinaker-222
roachprod put grinaker-222 artifacts/workload
roachprod stage grinaker-222 release v21.1.21

roachprod start grinaker-222:1-5

ALTER RANGE default CONFIGURE ZONE USING gc.ttlseconds=3600;
SET CLUSTER SETTING kv.rangefeed.enabled = true;
SET CLUSTER SETTING server.consistency_check.max_rate = '256 MiB';

# Load data.
./workload init tpch --data-loader import --scale-factor 10 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable'

ANALYZE customer;
ANALYZE lineitem;
ANALYZE nation;
ANALYZE orders;
ANALYZE part;
ANALYZE partsupp;
ANALYZE region;
ANALYZE supplier;

./workload run tpch --tolerate-errors 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable' 'postgres://[email protected]:26257?sslmode=disable'

# Upgrade nodes to 21.2 then 22.1.
(nodes 1-5)
export NODE=grinaker-222:1 VERSION=v21.2.17; roachprod stop $NODE && roachprod stage $NODE release $VERSION && roachprod start $NODE
export NODE=grinaker-222:1 VERSION=v22.1.10; roachprod stop $NODE && roachprod stage $NODE release $VERSION && roachprod start $NODE

# Take backups and start a changefeed.
BACKUP DATABASE tpch INTO 'gs://grinaker-backup/22.1?AUTH=implicit' WITH revision_history, detached;
BACKUP DATABASE tpch INTO 'gs://grinaker-backup/22.1-incr?AUTH=implicit' WITH revision_history, detached;

CREATE CHANGEFEED FOR lineitem, orders INTO 'gs://grinaker-backup/cdc?AUTH=implicit' WITH updated, resolved, diff, on_error='pause';

# Run consistency checks and gather fingerprints.
SELECT * FROM crdb_internal.check_consistency(false, '', '') WHERE status != 'RANGE_CONSISTENT';

SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE customer;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE lineitem;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE nation;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE part;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE partsupp;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE region;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE supplier;

# Upgrade n1,n2 to 22.2.
(nodes 1-2)
export NODE=grinaker-222:1 VERSION=v22.2.0-beta.5; roachprod stop $NODE && roachprod stage $NODE release $VERSION && roachprod start $NODE

# Take mixed-version backups (full and incremental).
BACKUP DATABASE tpch INTO 'gs://grinaker-backup/22.1-mixed?AUTH=implicit' WITH revision_history, detached;
BACKUP DATABASE tpch INTO LATEST IN 'gs://grinaker-backup/22.1-incr?AUTH=implicit' WITH revision_history, detached; 

# Check changefeed emission.
UPDATE orders SET o_comment = 'foo' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'bar' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'baz' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'inting small this evening. Cell ' WHERE o_orderkey = 1; -- origina

# Run some (cancelled) imports and schema changes.
CREATE DATABASE import;
USE import;
CREATE TABLE orders (
  o_orderkey           INTEGER NOT NULL PRIMARY KEY,
  o_custkey            INTEGER NOT NULL,
  o_orderstatus        CHAR(1) NOT NULL,
  o_totalprice         FLOAT NOT NULL,
  o_orderdate          DATE NOT NULL,
  o_orderpriority      CHAR(15) NOT NULL,
  o_clerk              CHAR(15) NOT NULL,
  o_shippriority       INTEGER NOT NULL,
  o_comment            VARCHAR(79) NOT NULL,
  INDEX o_ck           (o_custkey ASC),
  INDEX o_od           (o_orderdate ASC)
);

IMPORT INTO orders CSV DATA ('gs://cockroach-fixtures/tpch-csv/sf-100/orders.tbl.1?AUTH=implicit') WITH delimiter='|', detached;
ANALYZE orders;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;

SET CLUSTER SETTING jobs.debug.pausepoints = 'import.after_ingest';
IMPORT INTO orders CSV DATA ('gs://cockroach-fixtures/tpch-csv/sf-100/orders.tbl.2?AUTH=implicit') WITH delimiter='|', detached;
CANCEL JOB ...;
SET CLUSTER SETTING jobs.debug.pausepoints = '';

SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;
SELECT * FROM crdb_internal.check_consistency(false, '', '') WHERE status != 'RANGE_CONSISTENT';

ALTER TABLE orders ADD COLUMN o_foo INT DEFAULT 7;
ALTER TABLE orders DROP COLUMN o_foo;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;

# Complete upgrade to 22.2.
(nodes 3-5)
export NODE=grinaker-222:1 VERSION=v22.2.0-beta.5; roachprod stop $NODE && roachprod stage $NODE release $VERSION && roachprod start $NODE

# Enable MVCC range tombstones and other settings.
SET CLUSTER SETTING storage.mvcc.range_tombstones.enabled = true;
SET CLUSTER SETTING kv.bulk_io_write.sst_rewrite_concurrency.per_call = 4;
SET CLUSTER SETTING kv.bulk_io_write.sst_require_at_request_timestamp.enabled = true;
SET CLUSTER SETTING sql.gc_job.wait_for_gc.interval = '1m';

# Run a consistency check, restore backups, check backup fingerprints.
SELECT * FROM crdb_internal.check_consistency(false, '', '') WHERE status != 'RANGE_CONSISTENT';

RESTORE DATABASE tpch FROM LATEST IN 'gs://grinaker-backup/22.1?AUTH=implicit' WITH new_db_name = tpch221, detached;
RESTORE DATABASE tpch FROM LATEST IN 'gs://grinaker-backup/22.1-mixed?AUTH=implicit' WITH new_db_name = tpch221mixed, detached;

(USE tpch221, tpch221mixed)
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE customer;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE lineitem;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE nation;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE part;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE partsupp;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE region;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE supplier;

# Check changefeed emission.
UPDATE orders SET o_comment = 'foo' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'bar' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'baz' WHERE o_orderkey = 1;
UPDATE orders SET o_comment = 'inting small this evening. Cell ' WHERE o_orderkey = 1; -- original

# Run some schena changes and wait for them to get GCed.
ALTER RANGE default CONFIGURE ZONE USING gc.ttlseconds=300;

ALTER TABLE lineitem ADD COLUMN l_foo INT NOT NULL DEFAULT 7;
ALTER TABLE lineitem DROP COLUMN l_foo;

SELECT range_id, database_name, start_pretty, end_pretty, replicas, crdb_internal.range_stats(start_key)->'range_key_count' AS rangekeys
FROM crdb_internal.ranges_no_leases
WHERE (crdb_internal.range_stats(start_key)->'range_key_count')::int > 0;

SELECT * FROM crdb_internal.check_consistency(false, '', '') WHERE status != 'RANGE_CONSISTENT';

DROP DATABASE tpch221mixed;

# Run two imports: the first succeeds, the second is cancelled and rolled back.
CREATE DATABASE import;
USE import;
CREATE TABLE orders (
  o_orderkey           INTEGER NOT NULL PRIMARY KEY,
  o_custkey            INTEGER NOT NULL,
  o_orderstatus        CHAR(1) NOT NULL,
  o_totalprice         FLOAT NOT NULL,
  o_orderdate          DATE NOT NULL,
  o_orderpriority      CHAR(15) NOT NULL,
  o_clerk              CHAR(15) NOT NULL,
  o_shippriority       INTEGER NOT NULL,
  o_comment            VARCHAR(79) NOT NULL,
  INDEX o_ck           (o_custkey ASC),
  INDEX o_od           (o_orderdate ASC)
);

IMPORT INTO orders CSV DATA ('gs://cockroach-fixtures/tpch-csv/sf-100/orders.tbl.1?AUTH=implicit') WITH delimiter='|', detached;
ANALYZE orders;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;

-[ RECORD 1 ]
index_name  | orders_pkey
fingerprint | 6801166197436044005
-[ RECORD 2 ]
index_name  | o_ck
fingerprint | 5694152486057133176
-[ RECORD 3 ]
index_name  | o_od
fingerprint | -5502862233525451451

SET CLUSTER SETTING jobs.debug.pausepoints = 'import.after_ingest';
IMPORT INTO orders CSV DATA ('gs://cockroach-fixtures/tpch-csv/sf-100/orders.tbl.2?AUTH=implicit') WITH delimiter='|', detached;
CANCEL JOB ...;
SET CLUSTER SETTING jobs.debug.pausepoints = '';

SELECT range_id, database_name, start_pretty, end_pretty, replicas, crdb_internal.range_stats(start_key)->'range_key_count' AS rangekeys
FROM crdb_internal.ranges_no_leases
WHERE (crdb_internal.range_stats(start_key)->'range_key_count')::int > 0;

SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;
SELECT * FROM crdb_internal.check_consistency(false, '', '') WHERE status != 'RANGE_CONSISTENT';

ALTER TABLE orders ADD COLUMN o_foo INT DEFAULT 7;
ALTER TABLE orders DROP COLUMN o_foo;
SHOW EXPERIMENTAL_FINGERPRINTS FROM TABLE orders;

# Also repeat the import cancellation, wait for GC, and rerun fingerprint to verify GC.

# Drop all databases and wait for GC. Verify data is removed.
DROP DATABASE tpch221;
DROP DATABASE import;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant