Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support for ON UPDATE CASCADE for foreign key references #21329

Merged
merged 1 commit into from
Jan 16, 2018

Conversation

BramGruneir
Copy link
Member

Enables the addition of the ON UPDATE CASCADE action for foreign keys references.

Major changes in rowwriter.go:

  • Added a quick check to see if a cascader is required at all (this code is in cascader.go). And if not, the normal path is followed.
  • When a cascader is required, UpdateRow can only use a single batch per row. Unless we can read from batches without running them, this will be a requirement going forward.
  • Added the ability to skip foreign key checks for InsertRow.
  • Merge the DeleteRow() And deleteRowWithoutCascade() functions.

Major changes in cascader.go:

  • Added the new makeCascader functions that check to see if a cascader is required at all.
  • Of course, added updateRows(), which performs all the updates.
  • Added the checking of foreign key constraints at the end of a cascadeAll() call. This of course

Unlike with deletes, there is more to keep track of when looking for orphaned rows. In the wost case, rows can be updated multiple times. So row A -> B, then B -> C, then C -> D. But we don't want to test the middle states for foreign key violations, and we only want to test A -> D. So this accomplished by storing all transitions and looking forward through these updates to find if a row was updated again. There is potential to improve this using a map of some sort, but that can be done in a further update. The normal case is that there is only a single update and there is a quick path to only check that.

There is also a need to compare the contents two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B is the equivalent of the second B. To accomplish this, a new function on tree.Datums IsDistinctFrom() was added that treats nulls as equivalent.

Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported

@BramGruneir BramGruneir added this to the 2.0 milestone Jan 8, 2018
@BramGruneir BramGruneir requested review from dt, jordanlewis, knz and a team January 8, 2018 20:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Jan 10, 2018

Very nice work. I like how well polished it is.

As a reviewer it is still hard to review this successfully however, due to the complex nature of the change. I would have preferred if the commit message could walk me through the story of what is being changed and why. Right now the commit message reads more like a description of the patch "after the fact". I'd rather have a story tell me "so here's the situation before this patch. To achieve on update cascade, we need to ensure X and Y. The place to do this is Z. So the patch does A and B in location Z, but then we also need to do C and D in location W." And so forth.

Also there's a typo in the commit message: wost -> worst.


Reviewed 14 of 14 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks pending.


pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file):


query I
SELECT COUNT(*) FROM a

compress this:

query IIIIII
select 
    (select count(*) from a),
    (select count(*) from b1),
    (select count(*) from b2),
    (select count(*) from c1),
    (select count(*) from c2),
    (select count(*) from c3)
----
0 0 0 0 0

here and below


pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file):


statement ok
INSERT INTO b1 VALUES ('b1-pk1', 'original');

Make a single block, also use a single insert for adding multiple rows to the same table:

statement ok
INSERT INTO b1 ...; 
   INSERT INTO b2 ...;
   INSERT INTO c1 VALUES (...), (...), ...;
   INSERT INTO c2 VALUES (...), (...), ...

pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file):


statement ok
INSERT INTO a VALUES ('original'), ('updated');

ditto


pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file):


# Clean up after the test.
statement ok

compress: drop table c3, c2, c1, b2, b1, a


pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file):


query T
SELECT * FROM a

compress: select * from a, b, c, d, e, f -- with 1 row in each table you should get just 1 row in the cross product


pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file):


query T
SELECT * FROM a

ditto


pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file):

			}
		} else {
			if val.Compare(evalCtx, other[i]) != 0 {

I fear there are some data types whose Compare method needs a non-default EvalContext (ts with timestamp, collated strings). Any way to pass the evalctx as argument to this function? And arrange it to be proper in the caller?


pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file):

	if traceKV {
		log.VEventf(ctx, 2,
			"cascading update from refIndex:%s, into table:%s, using index:%s from values:%s to values:%s",

If you mean this to appear in the output of show kv trace (I suppose you do, otherwise you wouldn't make it conditional on traceKV), you need to add the corresponding filtering condition in show_trace.go.


pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file):

			if elem.updatedValues != nil {
				log.VEventf(
					ctx, 2, "cascading into %s for original values:%s updated to:%s",

ditto


pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file):

	traceKV bool,
) ([]tree.Datum, error) {

remove the empty newline


Comments from Reviewable

@dt
Copy link
Member

dt commented Jan 10, 2018

Reviewed 13 of 14 files at r1.
Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks pending.


pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file):

				row,
				true, /* ignoreConflicts */
				sqlbase.CheckFKs,

Hm. IIRC we reject FKs in schemas passed to IMPORT so I think this might not matter in practice, but the we definitely aren't actually checking existing rows while constructing these SSTs (like, I don't think they have a real txn to work with or anything).


pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file):

			return errors.Wrapf(err, "process insert %q", row)
		}
		if err := ri.InsertRow(ctx, b, row, true, sqlbase.CheckFKs, false /* traceKV */); err != nil {

ditto


pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file):

		return true
	}
	evalCtx := &EvalContext{}

Huh, I haven't been keeping careful tabs on sql eval lately, so I have no idea, but is it correct to just make an empty one of these?


Comments from Reviewable

@BramGruneir
Copy link
Member Author

Ok, all comments addressed and I've spent some time trying to describe what changes where made and why. Let me know if it helps.

I think I might be able to remove the passing of the bytes monitor down since it's part of the evalCtx now, but I'll do that in a follow up PR.


Review status: 2 of 18 files reviewed at latest revision, 13 unresolved discussions.


pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Hm. IIRC we reject FKs in schemas passed to IMPORT so I think this might not matter in practice, but the we definitely aren't actually checking existing rows while constructing these SSTs (like, I don't think they have a real txn to work with or anything).

In this case, since the only option was to check the FKs, I just left it as is. I'll add a TODO to investigate if that's required here.


pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file):

Previously, dt (David Taylor) wrote…

ditto

also added a todo here.


pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file):

Previously, knz (kena) wrote…

compress this:

query IIIIII
select 
    (select count(*) from a),
    (select count(*) from b1),
    (select count(*) from b2),
    (select count(*) from c1),
    (select count(*) from c2),
    (select count(*) from c3)
----
0 0 0 0 0

here and below

That's a lot cleaner. Done.


pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file):

Previously, knz (kena) wrote…

Make a single block, also use a single insert for adding multiple rows to the same table:

statement ok
INSERT INTO b1 ...; 
   INSERT INTO b2 ...;
   INSERT INTO c1 VALUES (...), (...), ...;
   INSERT INTO c2 VALUES (...), (...), ...

done. This looks a lot better and shrunk this file down a lot.


pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file):

Previously, knz (kena) wrote…

compress: drop table c3, c2, c1, b2, b1, a

Done.


pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file):

Previously, knz (kena) wrote…

compress: select * from a, b, c, d, e, f -- with 1 row in each table you should get just 1 row in the cross product

Done.


pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Huh, I haven't been keeping careful tabs on sql eval lately, so I have no idea, but is it correct to just make an empty one of these?

After consultations with @andreimatei, you are correct and the eval context is now passed down into the cascader.


pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file):

Previously, knz (kena) wrote…

I fear there are some data types whose Compare method needs a non-default EvalContext (ts with timestamp, collated strings). Any way to pass the evalctx as argument to this function? And arrange it to be proper in the caller?

Yep, done. It was easier than I thought.


pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file):

Previously, knz (kena) wrote…

If you mean this to appear in the output of show kv trace (I suppose you do, otherwise you wouldn't make it conditional on traceKV), you need to add the corresponding filtering condition in show_trace.go.

Ah, excellent. Done and it's a lot cleaner now and I added a test for both update and delete cascading.


pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file):

Previously, knz (kena) wrote…

remove the empty newline

Done.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 16, 2018

Yes your commit message is now astonishingly good.

Of course with such quality content it becomes possible to formulate follow-up concerns/questions. With the overhead of storing two copies of the tuples in memory, the cost of a multi-row update (when there are cascade clauses specified) becomes non-trivial.

I think that either in this PR or a follow-up you should create an additional benchmark in pkg/bench that exercises this code path and measures memory+time consumption for a simple combination of tables, with one benchmark doing the updates with ON CASCADE and one issuing multiple separate DML statements to do the same work in the same transaction.

@knz
Copy link
Contributor

knz commented Jan 16, 2018

(I forgot to mention, in case you also had forgotten you can get the mem stats with TESTFLAGS=-benchmem)

Then post the bench difference (as computed by benchstat) in the commit message for the commit where you also add the benchmark.

@knz
Copy link
Contributor

knz commented Jan 16, 2018

Oh and just a nit: can you please wrap your commit messages to 80 (or better, 72) columns.
Otherwise git log in a terminal will become obnoxious.

@BramGruneir
Copy link
Member Author

I'm planning on adding just such a benchmark. But the business need of getting the cascading operations working trumped the benchmarking for now.

I've never had anyone else complain about line length in commits before. I'll find a wrapper for all future ones.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 16, 2018

:lgtm:

Just a nit about the trace message.

Well done!


Reviewed 16 of 16 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/show_trace.go, line 97 at r2 (raw file):

   OR message LIKE 'execution failed: %'
   OR message LIKE 'r%: sending batch %'
   OR message LIKE 'Cascading %'

we don't usually do capitals in log/error messages. It makes less them composable with errors.Wrapf().


pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file):

	// Create the span to search for index values.
	if traceKV {
		log.VEventf(ctx, 2, "Cascading update into table: %d using index: %d",

see my comment about capitals


Comments from Reviewable

@BramGruneir
Copy link
Member Author

TFTRs! I'm excited to get this in.


Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions.


pkg/sql/show_trace.go, line 97 at r2 (raw file):

Previously, knz (kena) wrote…

we don't usually do capitals in log/error messages. It makes less them composable with errors.Wrapf().

Done.


pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file):

Previously, knz (kena) wrote…

see my comment about capitals

Done.


Comments from Reviewable

@dt
Copy link
Member

dt commented Jan 16, 2018

:lgtm:


Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions.


Comments from Reviewable

Enables the addition of the ON UPDATE CASCADE action for foreign keys references.

Major changes in `rowwriter.go`:
* Added a quick check to see if a cascader is required at all (this code is in `cascader.go`). And if not, the normal path is followed.
* When a cascader is required, UpdateRow can only use a single batch per row. Unless we can read from batches without running them, this will be a requirement going forward.
* Added the ability to skip foreign key checks for `InsertRow`.
* Merge the `DeleteRow()` And `deleteRowWithoutCascade()` functions.
* Pass the evalContext down into the row writer.

Major changes in `cascader.go`:
* Added the new `makeCascader` functions that check to see if a cascader is required at all.
* Of course, added `updateRows()`, which performs all the updates.
* Added the checking of foreign key constraints at the end of a `cascadeAll()` call.

This PR follows a similar route to the one that added ON DELETE CASCADE. The row updater creates a cascader and after performing its initial update operation, calls cascadeAll() to perform the cascading updates.  Inside the cascader, the index selection and cascading queue were already setup, so it was relatively easy to setup updateRows and get it to act in a similar way. However, this is where problems started to creep up immediately.

Unlike the deleter, if the original changes have not been run (as in the batch has been run), then future updates cascading values will not be found.  Since batches cannot be read from, the initial changes must be run prior to calling into the cascader.  This is problematic as it will adversely affect the speed of any non-cascading updates since each individual update will have to be run in a single batch. To get around this, the makeUpdateCascader function was added that only creates a cascader if a cascading update is possible. One was added for deleting as well. If there is no cascader the original batch is used and the cascader is never called into.

When cascading a delete, each delete has two steps. First, fetch the primary key columns of the rows that need to be deleted. Second, fetch those rows, specifically only the columns required for the rowDeleter and then delete them. With updates, I mistakenly made the assumption that it would be possible to eschew the first lookup and to just cascade the update directly. This proved to be incorrect. Not only are the primary key columns required, but all columns that have foreign key constraints are.

Unlike with deletes, there is more to keep track of when looking for orphaned rows. In the worst case, rows can be updated multiple times. So let say we have row A, and it gets updated 3 times.  Row A -> B, then B -> C, and then C -> D. But we don't want to test the middle states for foreign key violations as they will always fail. We only want to test A -> D. It's important to note, that relying on primary keys is not good enough, as even they can be updated.

Sadly, there is no easy way of accomplishing this. After trying numerous iterations on how to perform this check, I settled on just storing all transitions as a tuple based on two row containers per table. Each transition has an entry in both, one for the original and one for the updated value. Then by performing a forward look through all transitions pairs to find if a row was updated again. The usual case is that there is only a single update and there is a quick path to only check that. This is where the requirement to compare the full rows was needed. Unless all values that could be updated are stored, it would not be not possible to compare two versions of the same row.

But this leads to yet another issue.  Now there was a need to be able to compare the contents of two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B, is the equivalent of the second B. To accomplish this, a new function on `tree.Datums` `IsDistinctFrom()` was added. This then, based on review comments, leads to the need to pass an evalContext all the way down into the cascader.  This was relatively simple but required a little bit of plumbing.

Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported
@andreimatei
Copy link
Contributor

Review status: 15 of 18 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/sql/logictest/testdata/logic_test/cascade, line 854 at r3 (raw file):


# Update again but this time check show trace
statement ok

Bram, this line is an accident, right?
If so, I'm grateful for it.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants