sql: add support for ON UPDATE CASCADE for foreign key references #21329

BramGruneir · 2018-01-08T20:57:16Z

Enables the addition of the ON UPDATE CASCADE action for foreign keys references.

Major changes in rowwriter.go:

Added a quick check to see if a cascader is required at all (this code is in cascader.go). And if not, the normal path is followed.
When a cascader is required, UpdateRow can only use a single batch per row. Unless we can read from batches without running them, this will be a requirement going forward.
Added the ability to skip foreign key checks for InsertRow.
Merge the DeleteRow() And deleteRowWithoutCascade() functions.

Major changes in cascader.go:

Added the new makeCascader functions that check to see if a cascader is required at all.
Of course, added updateRows(), which performs all the updates.
Added the checking of foreign key constraints at the end of a cascadeAll() call. This of course

Unlike with deletes, there is more to keep track of when looking for orphaned rows. In the wost case, rows can be updated multiple times. So row A -> B, then B -> C, then C -> D. But we don't want to test the middle states for foreign key violations, and we only want to test A -> D. So this accomplished by storing all transitions and looking forward through these updates to find if a row was updated again. There is potential to improve this using a map of some sort, but that can be done in a further update. The normal case is that there is only a single update and there is a quick path to only check that.

There is also a need to compare the contents two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B is the equivalent of the second B. To accomplish this, a new function on tree.Datums IsDistinctFrom() was added that treats nulls as equivalent.

Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported

cockroach-teamcity · 2018-01-08T20:57:21Z

This change is

knz · 2018-01-10T17:26:06Z

Very nice work. I like how well polished it is.

As a reviewer it is still hard to review this successfully however, due to the complex nature of the change. I would have preferred if the commit message could walk me through the story of what is being changed and why. Right now the commit message reads more like a description of the patch "after the fact". I'd rather have a story tell me "so here's the situation before this patch. To achieve on update cascade, we need to ensure X and Y. The place to do this is Z. So the patch does A and B in location Z, but then we also need to do C and D in location W." And so forth.

Also there's a typo in the commit message: wost -> worst.

Reviewed 14 of 14 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks pending.

pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file):


query I
SELECT COUNT(*) FROM a

compress this:

query IIIIII
select 
    (select count(*) from a),
    (select count(*) from b1),
    (select count(*) from b2),
    (select count(*) from c1),
    (select count(*) from c2),
    (select count(*) from c3)
----
0 0 0 0 0

here and below

pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file):


statement ok
INSERT INTO b1 VALUES ('b1-pk1', 'original');

Make a single block, also use a single insert for adding multiple rows to the same table:

statement ok
INSERT INTO b1 ...; 
   INSERT INTO b2 ...;
   INSERT INTO c1 VALUES (...), (...), ...;
   INSERT INTO c2 VALUES (...), (...), ...

pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file):


statement ok
INSERT INTO a VALUES ('original'), ('updated');

ditto

pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file):


# Clean up after the test.
statement ok

compress: drop table c3, c2, c1, b2, b1, a

pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file):


query T
SELECT * FROM a

compress: select * from a, b, c, d, e, f -- with 1 row in each table you should get just 1 row in the cross product

pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file):


query T
SELECT * FROM a

ditto

pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file):

			}
		} else {
			if val.Compare(evalCtx, other[i]) != 0 {

I fear there are some data types whose Compare method needs a non-default EvalContext (ts with timestamp, collated strings). Any way to pass the evalctx as argument to this function? And arrange it to be proper in the caller?

pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file):

	if traceKV {
		log.VEventf(ctx, 2,
			"cascading update from refIndex:%s, into table:%s, using index:%s from values:%s to values:%s",

If you mean this to appear in the output of show kv trace (I suppose you do, otherwise you wouldn't make it conditional on traceKV), you need to add the corresponding filtering condition in show_trace.go.

pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file):

			if elem.updatedValues != nil {
				log.VEventf(
					ctx, 2, "cascading into %s for original values:%s updated to:%s",

ditto

pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file):

	traceKV bool,
) ([]tree.Datum, error) {

remove the empty newline

Comments from Reviewable

dt · 2018-01-10T17:28:19Z

Reviewed 13 of 14 files at r1.
Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks pending.

pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file):

				row,
				true, /* ignoreConflicts */
				sqlbase.CheckFKs,

Hm. IIRC we reject FKs in schemas passed to IMPORT so I think this might not matter in practice, but the we definitely aren't actually checking existing rows while constructing these SSTs (like, I don't think they have a real txn to work with or anything).

pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file):

			return errors.Wrapf(err, "process insert %q", row)
		}
		if err := ri.InsertRow(ctx, b, row, true, sqlbase.CheckFKs, false /* traceKV */); err != nil {

ditto

pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file):

		return true
	}
	evalCtx := &EvalContext{}

Huh, I haven't been keeping careful tabs on sql eval lately, so I have no idea, but is it correct to just make an empty one of these?

Comments from Reviewable

BramGruneir · 2018-01-16T07:42:42Z

Ok, all comments addressed and I've spent some time trying to describe what changes where made and why. Let me know if it helps.

I think I might be able to remove the passing of the bytes monitor down since it's part of the evalCtx now, but I'll do that in a follow up PR.

Review status: 2 of 18 files reviewed at latest revision, 13 unresolved discussions.

pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Hm. IIRC we reject FKs in schemas passed to IMPORT so I think this might not matter in practice, but the we definitely aren't actually checking existing rows while constructing these SSTs (like, I don't think they have a real txn to work with or anything).

In this case, since the only option was to check the FKs, I just left it as is. I'll add a TODO to investigate if that's required here.

pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file):

Previously, dt (David Taylor) wrote…

ditto

also added a todo here.

pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file):

Previously, knz (kena) wrote…

compress this:

query IIIIII
select 
    (select count(*) from a),
    (select count(*) from b1),
    (select count(*) from b2),
    (select count(*) from c1),
    (select count(*) from c2),
    (select count(*) from c3)
----
0 0 0 0 0

here and below

That's a lot cleaner. Done.

pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file):

Previously, knz (kena) wrote…

Make a single block, also use a single insert for adding multiple rows to the same table:
statement ok
INSERT INTO b1 ...; 
   INSERT INTO b2 ...;
   INSERT INTO c1 VALUES (...), (...), ...;
   INSERT INTO c2 VALUES (...), (...), ...

done. This looks a lot better and shrunk this file down a lot.

pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.

pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file):

Previously, knz (kena) wrote…

compress: drop table c3, c2, c1, b2, b1, a

Done.

pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file):

Previously, knz (kena) wrote…

compress: select * from a, b, c, d, e, f -- with 1 row in each table you should get just 1 row in the cross product

Done.

pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.

pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Huh, I haven't been keeping careful tabs on sql eval lately, so I have no idea, but is it correct to just make an empty one of these?

After consultations with @andreimatei, you are correct and the eval context is now passed down into the cascader.

pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file):

Previously, knz (kena) wrote…

I fear there are some data types whose Compare method needs a non-default EvalContext (ts with timestamp, collated strings). Any way to pass the evalctx as argument to this function? And arrange it to be proper in the caller?

Yep, done. It was easier than I thought.

pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file):

Previously, knz (kena) wrote…

If you mean this to appear in the output of show kv trace (I suppose you do, otherwise you wouldn't make it conditional on traceKV), you need to add the corresponding filtering condition in show_trace.go.

Ah, excellent. Done and it's a lot cleaner now and I added a test for both update and delete cascading.

pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.

pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file):

Previously, knz (kena) wrote…

remove the empty newline

Done.

Comments from Reviewable

knz · 2018-01-16T13:42:45Z

Yes your commit message is now astonishingly good.

Of course with such quality content it becomes possible to formulate follow-up concerns/questions. With the overhead of storing two copies of the tuples in memory, the cost of a multi-row update (when there are cascade clauses specified) becomes non-trivial.

I think that either in this PR or a follow-up you should create an additional benchmark in pkg/bench that exercises this code path and measures memory+time consumption for a simple combination of tables, with one benchmark doing the updates with ON CASCADE and one issuing multiple separate DML statements to do the same work in the same transaction.

knz · 2018-01-16T13:43:39Z

(I forgot to mention, in case you also had forgotten you can get the mem stats with TESTFLAGS=-benchmem)

Then post the bench difference (as computed by benchstat) in the commit message for the commit where you also add the benchmark.

knz · 2018-01-16T13:45:23Z

Oh and just a nit: can you please wrap your commit messages to 80 (or better, 72) columns.
Otherwise git log in a terminal will become obnoxious.

BramGruneir · 2018-01-16T16:06:33Z

I'm planning on adding just such a benchmark. But the business need of getting the cascading operations working trumped the benchmarking for now.

I've never had anyone else complain about line length in commits before. I'll find a wrapper for all future ones.

Comments from Reviewable

knz · 2018-01-16T16:39:24Z

Just a nit about the trace message.

Well done!

Reviewed 16 of 16 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/show_trace.go, line 97 at r2 (raw file):

   OR message LIKE 'execution failed: %'
   OR message LIKE 'r%: sending batch %'
   OR message LIKE 'Cascading %'

we don't usually do capitals in log/error messages. It makes less them composable with errors.Wrapf().

pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file):

	// Create the span to search for index values.
	if traceKV {
		log.VEventf(ctx, 2, "Cascading update into table: %d using index: %d",

see my comment about capitals

Comments from Reviewable

BramGruneir · 2018-01-16T18:08:17Z

TFTRs! I'm excited to get this in.

Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions.

pkg/sql/show_trace.go, line 97 at r2 (raw file):

Previously, knz (kena) wrote…

we don't usually do capitals in log/error messages. It makes less them composable with errors.Wrapf().

Done.

pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file):

Previously, knz (kena) wrote…

see my comment about capitals

Done.

Comments from Reviewable

dt · 2018-01-16T18:10:02Z

Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions.

Comments from Reviewable

Enables the addition of the ON UPDATE CASCADE action for foreign keys references. Major changes in `rowwriter.go`: * Added a quick check to see if a cascader is required at all (this code is in `cascader.go`). And if not, the normal path is followed. * When a cascader is required, UpdateRow can only use a single batch per row. Unless we can read from batches without running them, this will be a requirement going forward. * Added the ability to skip foreign key checks for `InsertRow`. * Merge the `DeleteRow()` And `deleteRowWithoutCascade()` functions. * Pass the evalContext down into the row writer. Major changes in `cascader.go`: * Added the new `makeCascader` functions that check to see if a cascader is required at all. * Of course, added `updateRows()`, which performs all the updates. * Added the checking of foreign key constraints at the end of a `cascadeAll()` call. This PR follows a similar route to the one that added ON DELETE CASCADE. The row updater creates a cascader and after performing its initial update operation, calls cascadeAll() to perform the cascading updates. Inside the cascader, the index selection and cascading queue were already setup, so it was relatively easy to setup updateRows and get it to act in a similar way. However, this is where problems started to creep up immediately. Unlike the deleter, if the original changes have not been run (as in the batch has been run), then future updates cascading values will not be found. Since batches cannot be read from, the initial changes must be run prior to calling into the cascader. This is problematic as it will adversely affect the speed of any non-cascading updates since each individual update will have to be run in a single batch. To get around this, the makeUpdateCascader function was added that only creates a cascader if a cascading update is possible. One was added for deleting as well. If there is no cascader the original batch is used and the cascader is never called into. When cascading a delete, each delete has two steps. First, fetch the primary key columns of the rows that need to be deleted. Second, fetch those rows, specifically only the columns required for the rowDeleter and then delete them. With updates, I mistakenly made the assumption that it would be possible to eschew the first lookup and to just cascade the update directly. This proved to be incorrect. Not only are the primary key columns required, but all columns that have foreign key constraints are. Unlike with deletes, there is more to keep track of when looking for orphaned rows. In the worst case, rows can be updated multiple times. So let say we have row A, and it gets updated 3 times. Row A -> B, then B -> C, and then C -> D. But we don't want to test the middle states for foreign key violations as they will always fail. We only want to test A -> D. It's important to note, that relying on primary keys is not good enough, as even they can be updated. Sadly, there is no easy way of accomplishing this. After trying numerous iterations on how to perform this check, I settled on just storing all transitions as a tuple based on two row containers per table. Each transition has an entry in both, one for the original and one for the updated value. Then by performing a forward look through all transitions pairs to find if a row was updated again. The usual case is that there is only a single update and there is a quick path to only check that. This is where the requirement to compare the full rows was needed. Unless all values that could be updated are stored, it would not be not possible to compare two versions of the same row. But this leads to yet another issue. Now there was a need to be able to compare the contents of two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B, is the equivalent of the second B. To accomplish this, a new function on `tree.Datums` `IsDistinctFrom()` was added. This then, based on review comments, leads to the need to pass an evalContext all the way down into the cascader. This was relatively simple but required a little bit of plumbing. Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported

andreimatei · 2018-02-11T03:09:57Z

Review status: 15 of 18 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

pkg/sql/logictest/testdata/logic_test/cascade, line 854 at r3 (raw file):


# Update again but this time check show trace
statement ok

Bram, this line is an accident, right?
If so, I'm grateful for it.

Comments from Reviewable

BramGruneir added this to the 2.0 milestone Jan 8, 2018

BramGruneir requested review from dt, jordanlewis, knz and a team January 8, 2018 20:57

BramGruneir force-pushed the cascade branch from b9e0c61 to 8284384 Compare January 16, 2018 07:41

BramGruneir force-pushed the cascade branch from 8284384 to 400f3b5 Compare January 16, 2018 18:08

BramGruneir force-pushed the cascade branch from 400f3b5 to 50b6aed Compare January 16, 2018 18:53

benesch mentioned this pull request Jan 16, 2018

sqlccl: speed up partitioning tests #21392

Merged

BramGruneir merged commit f9f565e into cockroachdb:master Jan 16, 2018

BramGruneir mentioned this pull request Jan 16, 2018

sql: support ON DELETE CASCADE constraint behavior #14848

Closed

knz mentioned this pull request Jan 18, 2018

scripts: add a script to extract release notes #21515

Merged

jseldess mentioned this pull request Jan 23, 2018

Row-level CASCADE (ON DELETE/UPDATE) cockroachdb/docs#2015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add support for ON UPDATE CASCADE for foreign key references #21329

sql: add support for ON UPDATE CASCADE for foreign key references #21329

BramGruneir commented Jan 8, 2018

cockroach-teamcity commented Jan 8, 2018

knz commented Jan 10, 2018

dt commented Jan 10, 2018

BramGruneir commented Jan 16, 2018

knz commented Jan 16, 2018

knz commented Jan 16, 2018

knz commented Jan 16, 2018

BramGruneir commented Jan 16, 2018

knz commented Jan 16, 2018

BramGruneir commented Jan 16, 2018

dt commented Jan 16, 2018

andreimatei commented Feb 11, 2018

sql: add support for ON UPDATE CASCADE for foreign key references #21329

sql: add support for ON UPDATE CASCADE for foreign key references #21329

Conversation

BramGruneir commented Jan 8, 2018

cockroach-teamcity commented Jan 8, 2018

knz commented Jan 10, 2018

dt commented Jan 10, 2018

BramGruneir commented Jan 16, 2018

knz commented Jan 16, 2018

knz commented Jan 16, 2018

knz commented Jan 16, 2018

BramGruneir commented Jan 16, 2018

knz commented Jan 16, 2018

BramGruneir commented Jan 16, 2018

dt commented Jan 16, 2018

andreimatei commented Feb 11, 2018