-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add support for ON UPDATE CASCADE for foreign key references #21329
Conversation
Very nice work. I like how well polished it is. As a reviewer it is still hard to review this successfully however, due to the complex nature of the change. I would have preferred if the commit message could walk me through the story of what is being changed and why. Right now the commit message reads more like a description of the patch "after the fact". I'd rather have a story tell me "so here's the situation before this patch. To achieve on update cascade, we need to ensure X and Y. The place to do this is Z. So the patch does A and B in location Z, but then we also need to do C and D in location W." And so forth. Also there's a typo in the commit message: wost -> worst. Reviewed 14 of 14 files at r1. pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file):
compress this:
here and below pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file):
Make a single block, also use a single insert for adding multiple rows to the same table:
pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file):
ditto pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file):
compress: pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file):
compress: pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file):
ditto pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file):
I fear there are some data types whose Compare method needs a non-default EvalContext (ts with timestamp, collated strings). Any way to pass the evalctx as argument to this function? And arrange it to be proper in the caller? pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file):
If you mean this to appear in the output of pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file):
ditto pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file):
remove the empty newline Comments from Reviewable |
Reviewed 13 of 14 files at r1. pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file):
Hm. IIRC we reject FKs in schemas passed to IMPORT so I think this might not matter in practice, but the we definitely aren't actually checking existing rows while constructing these SSTs (like, I don't think they have a real txn to work with or anything). pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file):
ditto pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file):
Huh, I haven't been keeping careful tabs on sql eval lately, so I have no idea, but is it correct to just make an empty one of these? Comments from Reviewable |
Ok, all comments addressed and I've spent some time trying to describe what changes where made and why. Let me know if it helps. I think I might be able to remove the passing of the bytes monitor down since it's part of the evalCtx now, but I'll do that in a follow up PR. Review status: 2 of 18 files reviewed at latest revision, 13 unresolved discussions. pkg/ccl/sqlccl/csv.go, line 606 at r1 (raw file): Previously, dt (David Taylor) wrote…
In this case, since the only option was to check the FKs, I just left it as is. I'll add a TODO to investigate if that's required here. pkg/ccl/sqlccl/load.go, line 305 at r1 (raw file): Previously, dt (David Taylor) wrote…
also added a todo here. pkg/sql/logictest/testdata/logic_test/cascade, line 95 at r1 (raw file): Previously, knz (kena) wrote…
That's a lot cleaner. Done. pkg/sql/logictest/testdata/logic_test/cascade, line 1150 at r1 (raw file): Previously, knz (kena) wrote…
done. This looks a lot better and shrunk this file down a lot. pkg/sql/logictest/testdata/logic_test/cascade, line 1643 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cascade, line 1706 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cascade, line 2051 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cascade, line 2168 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/sem/tree/datum.go, line 172 at r1 (raw file): Previously, dt (David Taylor) wrote…
After consultations with @andreimatei, you are correct and the eval context is now passed down into the cascader. pkg/sql/sem/tree/datum.go, line 179 at r1 (raw file): Previously, knz (kena) wrote…
Yep, done. It was easier than I thought. pkg/sql/sqlbase/cascader.go, line 612 at r1 (raw file): Previously, knz (kena) wrote…
Ah, excellent. Done and it's a lot cleaner now and I added a test for both update and delete cascading. pkg/sql/sqlbase/cascader.go, line 892 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/sqlbase/rowwriter.go, line 630 at r1 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
Yes your commit message is now astonishingly good. Of course with such quality content it becomes possible to formulate follow-up concerns/questions. With the overhead of storing two copies of the tuples in memory, the cost of a multi-row update (when there are cascade clauses specified) becomes non-trivial. I think that either in this PR or a follow-up you should create an additional benchmark in |
(I forgot to mention, in case you also had forgotten you can get the mem stats with Then post the bench difference (as computed by |
Oh and just a nit: can you please wrap your commit messages to 80 (or better, 72) columns. |
I'm planning on adding just such a benchmark. But the business need of getting the cascading operations working trumped the benchmarking for now. I've never had anyone else complain about line length in commits before. I'll find a wrapper for all future ones. Comments from Reviewable |
Just a nit about the trace message.
Well done! Reviewed 16 of 16 files at r2. pkg/sql/show_trace.go, line 97 at r2 (raw file):
we don't usually do capitals in log/error messages. It makes less them composable with errors.Wrapf(). pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file):
see my comment about capitals Comments from Reviewable |
TFTRs! I'm excited to get this in. Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions. pkg/sql/show_trace.go, line 97 at r2 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/sqlbase/cascader.go, line 615 at r2 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
Review status: 15 of 18 files reviewed at latest revision, 5 unresolved discussions. Comments from Reviewable |
Enables the addition of the ON UPDATE CASCADE action for foreign keys references. Major changes in `rowwriter.go`: * Added a quick check to see if a cascader is required at all (this code is in `cascader.go`). And if not, the normal path is followed. * When a cascader is required, UpdateRow can only use a single batch per row. Unless we can read from batches without running them, this will be a requirement going forward. * Added the ability to skip foreign key checks for `InsertRow`. * Merge the `DeleteRow()` And `deleteRowWithoutCascade()` functions. * Pass the evalContext down into the row writer. Major changes in `cascader.go`: * Added the new `makeCascader` functions that check to see if a cascader is required at all. * Of course, added `updateRows()`, which performs all the updates. * Added the checking of foreign key constraints at the end of a `cascadeAll()` call. This PR follows a similar route to the one that added ON DELETE CASCADE. The row updater creates a cascader and after performing its initial update operation, calls cascadeAll() to perform the cascading updates. Inside the cascader, the index selection and cascading queue were already setup, so it was relatively easy to setup updateRows and get it to act in a similar way. However, this is where problems started to creep up immediately. Unlike the deleter, if the original changes have not been run (as in the batch has been run), then future updates cascading values will not be found. Since batches cannot be read from, the initial changes must be run prior to calling into the cascader. This is problematic as it will adversely affect the speed of any non-cascading updates since each individual update will have to be run in a single batch. To get around this, the makeUpdateCascader function was added that only creates a cascader if a cascading update is possible. One was added for deleting as well. If there is no cascader the original batch is used and the cascader is never called into. When cascading a delete, each delete has two steps. First, fetch the primary key columns of the rows that need to be deleted. Second, fetch those rows, specifically only the columns required for the rowDeleter and then delete them. With updates, I mistakenly made the assumption that it would be possible to eschew the first lookup and to just cascade the update directly. This proved to be incorrect. Not only are the primary key columns required, but all columns that have foreign key constraints are. Unlike with deletes, there is more to keep track of when looking for orphaned rows. In the worst case, rows can be updated multiple times. So let say we have row A, and it gets updated 3 times. Row A -> B, then B -> C, and then C -> D. But we don't want to test the middle states for foreign key violations as they will always fail. We only want to test A -> D. It's important to note, that relying on primary keys is not good enough, as even they can be updated. Sadly, there is no easy way of accomplishing this. After trying numerous iterations on how to perform this check, I settled on just storing all transitions as a tuple based on two row containers per table. Each transition has an entry in both, one for the original and one for the updated value. Then by performing a forward look through all transitions pairs to find if a row was updated again. The usual case is that there is only a single update and there is a quick path to only check that. This is where the requirement to compare the full rows was needed. Unless all values that could be updated are stored, it would not be not possible to compare two versions of the same row. But this leads to yet another issue. Now there was a need to be able to compare the contents of two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B, is the equivalent of the second B. To accomplish this, a new function on `tree.Datums` `IsDistinctFrom()` was added. This then, based on review comments, leads to the need to pass an evalContext all the way down into the cascader. This was relatively simple but required a little bit of plumbing. Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported
Review status: 15 of 18 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. pkg/sql/logictest/testdata/logic_test/cascade, line 854 at r3 (raw file):
Bram, this line is an accident, right? Comments from Reviewable |
Enables the addition of the ON UPDATE CASCADE action for foreign keys references.
Major changes in
rowwriter.go
:cascader.go
). And if not, the normal path is followed.InsertRow
.DeleteRow()
AnddeleteRowWithoutCascade()
functions.Major changes in
cascader.go
:makeCascader
functions that check to see if a cascader is required at all.updateRows()
, which performs all the updates.cascadeAll()
call. This of courseUnlike with deletes, there is more to keep track of when looking for orphaned rows. In the wost case, rows can be updated multiple times. So row A -> B, then B -> C, then C -> D. But we don't want to test the middle states for foreign key violations, and we only want to test A -> D. So this accomplished by storing all transitions and looking forward through these updates to find if a row was updated again. There is potential to improve this using a map of some sort, but that can be done in a further update. The normal case is that there is only a single update and there is a quick path to only check that.
There is also a need to compare the contents two rows. Using the example above, say after A -> B and then B -> C, there needs to be a way to determine that the first B is the equivalent of the second B. To accomplish this, a new function on
tree.Datums
IsDistinctFrom()
was added that treats nulls as equivalent.Release note (SQL): ON UPDATE CASCADE foreign key constraints are fully supported