Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: sequences created by SERIAL have weird semantics with regards to database #51090

Closed
ajwerner opened this issue Jul 7, 2020 · 18 comments · Fixed by #59864 or #60957
Closed

sql: sequences created by SERIAL have weird semantics with regards to database #51090

ajwerner opened this issue Jul 7, 2020 · 18 comments · Fixed by #59864 or #60957
Assignees
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jul 7, 2020

Describe the problem

When we create SERIAL types and we have serial_normalization set to 'sql_sequence' we construct a new sequence. Unfortunately, the sequence we construct is in the current database rather than the database of the table. This leads to strange semantics where the table can only be written to if the current session is in the database of the sequence.

The problem here is that we use the session's database to resolve the sequence rather than the table's. In postgres, IIUC, the table's database and schema are used to resolve the sequence.

To Reproduce

root@localhost:26257/defaultdb> SET serial_normalization = 'sql_sequence';
SET

root@localhost:26257/defaultdb> CREATE DATABASE other_db;
CREATE DATABASE

root@localhost:26257/defaultdb> CREATE TABLE other_db.t (s SERIAL PRIMARY KEY, i INT);
CREATE TABLE

root@localhost:26257/defaultdb> INSERT INTO other_db.t (i) VALUES (1);
INSERT 1

root@localhost:26257/defaultdb> USE other_db;
SET

root@localhost:26257/other_db> INSERT INTO t (i) VALUES (2);
ERROR: nextval(): relation "t_s_seq" does not exist
SQLSTATE: 42P01

Even worse, if I create a sequence in that database with that name, it'll get used:

root@localhost:26257/other_db> CREATE SEQUENCE t_s_seq;
CREATE SEQUENCE

root@localhost:26257/other_db> INSERT INTO t (i) VALUES (2);
ERROR: duplicate key value (s)=(1) violates unique constraint "primary"
SQLSTATE: 23505

Expected behavior
The SERIAL column will work regardless of which database the current session is in. Furthermore, as we move to user-defined schemas, it should refer to a single sequence regardless of the current session's search_path.

@blathers-crl
Copy link

blathers-crl bot commented Jul 7, 2020

Hi @ajwerner, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 7, 2020

@arulajmani you'll probably find this one interesting. In postgres this works better because nextval takes a regclass and not a string. Curious to hear your thoughts.

@knz knz added A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jul 8, 2020
@knz
Copy link
Contributor

knz commented Jul 8, 2020

FWIW I recall there was a recent discussion to disallow cross-db references altogether.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 8, 2020

FWIW I recall there was a recent discussion to disallow cross-db references altogether.

That's true. You can, however, take the above example and apply it to user-defined schemas and search path and if we don't change anything, we'll still have a problem.


This issue has several dimensions

  1. We use nextvar with a string instead of a regclass so the element being referenced is ambiguous because we do resolution at runtime.
  2. (The deeper issue) is that we store our default expressions as strings rather than a richer structure. In postgres, it seems, the regclass is stored in the default expression as its integer value but regclass values display as their resolved string names.
  3. Even if we had a way to store the default expr such that it was expressed as an integer regclass (which we very much do not), cockroach doesn't render regclass intergers by their resolved names.
postgres=# SELECT 16441::regclass;                                             
 regclass                              
----------                             
 foo 

Imagine today we do the following:

s INT8 NOT NULL DEFAULT nextval('foo_s_seq':::STRING)

I have two related concrete proposals that both have the same first:

  • We provide an overload for nextval that takes a regclass.

A. In the default expression we just use an integer cast to a regclass. For example:

s INT8 NOT NULL DEFAULT nextval(60::regclass)

B. In the default expression we generate for columns using SERIAL with 'sql_sequence' we start to use crdb_internal.create_regclass as the input to nextval. The advantage with this is that it's clearer which sequence is in use for the reader.

s INT8 NOT NULL DEFAULT nextval(crdb_internal.create_regclass(60, 'foo_s_seq'))

@arulajmani
Copy link
Collaborator

arulajmani commented Jul 8, 2020

Nice find! I guess this is currently blocking us from a logictest configuration that uses sequences for serial normalization.

I tried creating a sequence with a fully qualified name which would force the sequence to be created in the same database as the table, and storing the fully qualified name in the default expression, and that seems to fix the problem. I'll add some tests and put a PR out.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 8, 2020

Yeah, that proposal makes sense. In the absence of a regclass overloaded nextval and a proper way to store that regclass in the default expr, using a FQN in the string seems like the right idea. Thanks!

@knz
Copy link
Contributor

knz commented Jul 8, 2020

Can you check that doesn't explode if/when the db gets renamed.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 8, 2020

Can you check that doesn't explode if/when the db gets renamed.

That's a great point. We'll need to deal with this when renaming database (and schemas for UDS).

@arulajmani
Copy link
Collaborator

It doesn't implode, but it will prohibit renaming databases that use SERIAL and I'm not sure that's much better. There is a closed issue for this (#34416), but I don't think the issue has been fully addressed -- Referring to sequences by their FQNs when using them still results in an error.

[email protected]:59882/defaultdb> CREATE SEQUENCE foo;
CREATE SEQUENCE

Time: 3.217ms

[email protected]:59882/defaultdb> CREATE TABLE t(a INT DEFAULT nextval('defaultdb.foo'));
CREATE TABLE

Time: 25.509ms

[email protected]:59882/defaultdb> ALTER DATABASE defaultdb RENAME TO default_db;
ERROR: cannot rename database because relation "defaultdb.public.t" depends on relation "defaultdb.public.foo"
SQLSTATE: 2BP01
HINT: you can drop the column default "a" of "defaultdb.public.foo" referencing "defaultdb.public.t" or modify the default to not reference the database name "defaultdb"

With this rename issue + the fact that we'll be supporting UDS very soon, I'm leaning more towards Andrew's proposal of solving this by having default expressions store something more robust and overloading nextval. (as opposed to the earlier proposed quick win solution). Additionally, this would allow us to rename sequences which are used by columns, functionality that is currently not supported.

To me it seems like rendering the regclass to its string representation is the major issue there. I'm not too familiar with how/how much regclass compatibility we support, but this seems hard.

I wonder if as an intermediate step there is value in storing the sequence's descriptor ID and name (at creation) in the default expression which will allow us to implement Alternative B in @ajwerner's proposal above. We could be displaying stale information if the sequence is renamed, or we could continue to disallow sequence renames like we do today.

@ajwerner
Copy link
Contributor Author

I wonder if as an intermediate step there is value in storing the sequence's descriptor ID and name (at creation) in the default expression which will allow us to implement Alternative B in @ajwerner's proposal above. We could be displaying stale information if the sequence is renamed, or we could continue to disallow sequence renames like we do today.

I'm happy with B as a first step though do want to ask about sequence renames. It seems to me like getting sequence renames right isn't too terrible if all we need to do is update the table[s] which reference it. I was much more worried about needing to traverse all of the tables when renaming a database or schema.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 30, 2020

Alright here's a perhaps real pathway forward, when do we need to know the name of referenced descriptor? Seems like only for display purposes. So the latest thinking on this proposal is these steps that each get us closer to postgres

  1. implement a regclass overload on nextval
  2. stop creating a new table when truncating because that makes table IDs non stable (and is the only thing as far as I know) sql: create new indexes rather than a new table in TRUNCATE #52170
  3. start encoding the expression with the integer regclass
  4. resolve the regclass to a name for display

Only 2 seems hard. This could be pretty good. This proposal is in many ways exactly what's asked for in #22212. Relates to #10083 as we'd want to do the same thing in view queries with names.

@knz
Copy link
Contributor

knz commented Jul 31, 2020

In fact making table ID stable would be good for pg compatibility. UI tools that use pg_catalog get confused if you truncate a table (and rotate the ID) from "under" them.

@ajwerner
Copy link
Contributor Author

in positive news here, we did the truncate thing so the path to a brighter future is now clearer. There are open questions with regards to migrations of existing expressions. I'm rather open to not migrating them.

Ideally we'd make the "default" overload for the function into the regclass so that new expressions which utilize the string get resolved to regclasses so that the same expression ends up with the better serialization. For existing string serialization the regclass resolution at runtime should end up doing exactly what we do today.

That might be wishful thinking, I don't know enough about the way the overloads work out with the constant expressions. Worst case we could put some hacks in place when to prefer resolving the strings to regclass integers when looking to serialize the sequence operations into descriptors.

@knz
Copy link
Contributor

knz commented Sep 16, 2020

@otan says that the issue in #53819 is a duplicate, here's the symptom

> create sequence s;
> create view a as select nextval('s'); -- OK!
> create materialized view am as select nextval('s');
ERROR: nextval(): relation "s" does not exist
SQLSTATE: 42P01

I don't actually understand why they are marked as duplicate to each other (see my question in #53819 (comment) )

and thus would appreciate an explanation thanks

@otan
Copy link
Contributor

otan commented Sep 16, 2020

The suggested solution to resolving this (#54410) is the same.

@knz
Copy link
Contributor

knz commented Sep 16, 2020

  1. that PR you just linkd to has been closed and not merged, so can you explain what is the actual suggested solution?

  2. we don't typically close issues as duplicate when they have the same solution, but rather when they have overlapping set of causes

@knz
Copy link
Contributor

knz commented Sep 16, 2020

(re my point 2: if there are separate causes but common solutions, we can have 1 PR that "closes" 2 or more issues at a time)

@ajwerner
Copy link
Contributor Author

@knz I agree with your assessment and have determined that while the above proposal would fix #53819, that is a different issue. In the short term I'd like to take the approach used by #42508 to somewhat resolve the issue. It's not pretty but it's clearer than what we see today.

craig bot pushed a commit that referenced this issue Feb 5, 2021
59396: sql: overload sequence operators to accept a regclass r=the-ericwang35 a=the-ericwang35

sql: overload sequence operators to accept a regclass
  
Previously, sequence operators (nextval, setval, currval)
only accepted a string to refer to the referenced sequence.
This patch allows these operators to accept a regclass as well.
Tests were also added to verify this.
This is the first step to fix this issue:
#51090.

Release note (sql change): overload sequence operators to accept a regclass

Co-authored-by: Eric Wang <[email protected]>
@the-ericwang35 the-ericwang35 linked a pull request Feb 23, 2021 that will close this issue
craig bot pushed a commit that referenced this issue Feb 23, 2021
60957: sql: add backup/restore tests for sequences r=the-ericwang35 a=the-ericwang35

This patch just adds tests for backup/restore with respect to sequences.
Now that all new sequences are referenced by ID and can be renamed,
we want to test that previously created sequences are not affected,
and still follow the old rules (i.e. cannot be renamed and
and are referenced by their names.
Part of #51090

In the future, we'll want to convert these old sequences to
new sequences (i.e. can be renamed and are referenced by ID, 
see #60942) which means these tests will need to change, but
for now, we'll keep the old sequences as is.


Release note (None): add backup/restore tests for sequences

60972: sql: error if SURVIVE is set with no regions r=ajstorm a=otan

Release justification: bug fix to new functionality

Release note: None

60993: colexec: remove duplication around bazel utility functions r=yuzefovich a=yuzefovich

Previously, we had essentially duplicated `.bzl` files in two packages
that need to generate code using `execgen`, but this can be avoided by
defining a separate `.bzl` file as an extension used in both places.

Release note: None

61008: TeamCity: update build agents to use the latest sudo r=rail a=rail

Release note: None

61024: authors: add erikgrinaker r=erikgrinaker a=erikgrinaker

Release note: none

61025: authors: add fqazi to authors r=fqazi a=fqazi

authors: add fqazi to authors

Release note: None

61026: authors: add abarganier to authors r=abarganier a=abarganier

Release note: None

Co-authored-by: Eric Wang <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
@craig craig bot closed this as completed in #60957 Feb 23, 2021
craig bot pushed a commit that referenced this issue Mar 19, 2021
61126: sql: add version upgrade test for sequences r=ajwerner a=the-ericwang35

This patch just adds some version upgrade tests
for sequences. Now that all new sequences are
referenced by ID and can be renamed, we want to test
that previously created sequences are not affected,
and still follow the old rules (i.e. cannot be renamed and
and are referenced by their names.
This test creates some sequences in 20.2.5, upgrades the
cluster, and verifies this behaviour.
Part of #51090.

In the future, we'll want to convert these old sequences to
new sequences (i.e. can be renamed and are referenced by ID, 
see #60942) which means these tests will need to change, but
for now, we'll keep the old sequences as is.

Release note: None

Release justification: only adds tests for existing functionality

62203: sql: capture sequences used in cast exprs r=ajwerner a=the-ericwang35

Previously, `GetSequenceFromFunc` only looked
for `*tree.DString` and `*tree.DOid`. However,
it's also possible for the sequence to be given via
a `*tree.CastExpr` (for example, `nextval('s'::regclass)`).
In this case, the sequence was not being captured,
which meant no back references were created and
no sequence encoding was done.
This PR addresses this by adding an additional
case to capture these sequences.

This will probably need to be backported.

Release note: None

Co-authored-by: Eric Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
5 participants