Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: can't atomically swap names for catalog entries #54562

Closed
thoszhang opened this issue Sep 18, 2020 · 4 comments · Fixed by #70334
Closed

sql: can't atomically swap names for catalog entries #54562

thoszhang opened this issue Sep 18, 2020 · 4 comments · Fixed by #70334
Assignees
Labels
A-schema-changes A-schema-descriptors Relating to SQL table/db descriptor handling. A-schema-transactional C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@thoszhang
Copy link
Contributor

thoszhang commented Sep 18, 2020

These operations are allowed in postgres, but not in cockroach, where you'll get a relation ... already exists error:

lucy=# create table foo();
CREATE TABLE
lucy=# begin;
BEGIN
lucy=# alter table foo rename to bar;
ALTER TABLE
lucy=# create table foo();
CREATE TABLE
lucy=# commit;
COMMIT
lucy=# create table foo();
CREATE TABLE
lucy=# create table bar();
CREATE TABLE
lucy=# begin;
BEGIN
lucy=# drop table foo;
DROP TABLE
lucy=# alter table bar rename to foo;
ALTER TABLE
lucy=# commit;
COMMIT
lucy=# create table foo();
CREATE TABLE
lucy=# create table bar();
CREATE TABLE
lucy=# begin;
BEGIN
lucy=# alter table foo rename to temp;
ALTER TABLE
lucy=# alter table bar rename to foo;
ALTER TABLE
lucy=# alter table temp rename to bar;
ALTER TABLE
lucy=# commit;
COMMIT

#12123 is a special case of the same issue. Of course, this also applies to other things in the catalog (databases, schemas, types).

The problem is that we prohibit multiple names from referring to the same descriptor at the same time, and this is enforced through the draining names system. Can we lift this restriction?


Epic CRDB-10483

@blathers-crl
Copy link

blathers-crl bot commented Sep 18, 2020

Hi @lucy-zhang, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@thoszhang thoszhang added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-schema-changes A-schema-descriptors Relating to SQL table/db descriptor handling. A-schema-transactional labels Sep 18, 2020
@thoszhang thoszhang changed the title sql: can't atomically swap table names sql: can't atomically swap names for catalog entries Oct 27, 2020
@ajwerner
Copy link
Contributor

Fixing this represents a trade off. Because of that way that cockroach's online schema change protocol works, if we allowed this, there would be some period of time where the name referred to both tables. Maybe that's acceptable.

@vy-ton this is an interesting case to think about regarding transactional schema changes. I think we will choose to make this trade off because the alternative is not supporting something we know that users do -- and in most cases the semantics are pretty sound. I think this is an example of being too cautious.

@vy-ton
Copy link
Contributor

vy-ton commented Feb 22, 2021

there would be some period of time where the name referred to both tables

There is some danger if the application code does not correctly handle object renames. Might worth revisiting this discussion about name aliasing.

@ajwerner
Copy link
Contributor

ajwerner commented Mar 2, 2021

I'm coming around to the idea that we should just let this anomaly happen. The compatibility burden is too high. There are fancy things we could consider one day doing to make it so that during a small window of time, transaction transactionally read the descriptor or something but that seems like an also bad tradeoff. Another big consideration for me is that we're going to need to do something about TRUNCATE. In a transactional world, the transaction is going to want to see an empty table after TRUNCATE. That means that it's going to write to the new indexes. If we want to remain offline, there will be a period of time where transactions which follow that transaction (though not the return of COMMIT) will see the old, pre-truncated data even though they may see writes to other tables by transactions which saw the post-truncated data. That's sort of unavoidable.

I say we boldly opt into this class of anomaly and just call it correct by our definition. I think this all remains pretty cogent. The claim is that operations which are concurrent with one of these schema changes (i.e. before COMMIT returns) then they might see some arbitrary interleaving of the versions. If people are mad about this then later we can add some barrier mode as an add-on that you opt into. I think most people will be very happy with this.

@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar postamar self-assigned this Sep 13, 2021
postamar pushed a commit to postamar/cockroach that referenced this issue Sep 17, 2021
This refactoring commit is the first step to removing these fields and
their associated methods.

See issue cockroachdb#54562.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Oct 15, 2021
This refactoring commit is the first step to removing these fields and
their associated methods.

See issue cockroachdb#54562.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Oct 15, 2021
This commit introduces a new cluster version, keyed as
AvoidDrainingNames, which allows atomic name swaps between two
tables/databases/schemas/types. This means we're no longer populating
the draining names of a descriptor, instead we're updating the namespace
table in the same transaction.

The price to pay for all of this is that there is a period of time
following a name swap in which the name may refer to either of the two
descriptors.

Fixes cockroachdb#54562.

Release note (sql change): It's now possible to swap names (for tables,
etc.) in the same transaction. For example:

  CREATE TABLE foo();
  BEGIN;
  ALTER TABLE foo RENAME TO bar;
  CREATE TABLE foo();
  COMMIT;

Previously, we'd be getting a "relation ... already exists" error.
postamar pushed a commit to postamar/cockroach that referenced this issue Nov 3, 2021
This refactoring commit is the first step to removing these fields and
their associated methods.

See issue cockroachdb#54562.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Nov 3, 2021
This commit introduces a new cluster version, keyed as
AvoidDrainingNames, which allows atomic name swaps between two
tables/databases/schemas/types. This means we're no longer populating
the draining names of a descriptor, instead we're updating the namespace
table in the same transaction.

The price to pay for all of this is that there is a period of time
following a name swap in which the name may refer to either of the two
descriptors.

Fixes cockroachdb#54562.

Release note (sql change): It's now possible to swap names (for tables,
etc.) in the same transaction. For example:

  CREATE TABLE foo();
  BEGIN;
  ALTER TABLE foo RENAME TO bar;
  CREATE TABLE foo();
  COMMIT;

Previously, we'd be getting a "relation ... already exists" error.
postamar pushed a commit to postamar/cockroach that referenced this issue Nov 4, 2021
This refactoring commit is the first step to removing these fields and
their associated methods.

See issue cockroachdb#54562.

Release note: None
craig bot pushed a commit that referenced this issue Nov 4, 2021
57208: sql: more allocation reductions r=yuzefovich a=jordanlewis

See individual commits for details.

```
name                                          old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-24      283µs ± 9%     279µs ± 6%    ~     (p=0.316 n=29+30)
FlowSetup/vectorize=true/distribute=false-24     275µs ± 4%     273µs ± 6%    ~     (p=0.107 n=29+30)

name                                          old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-24     37.9kB ± 1%    37.9kB ± 1%    ~     (p=0.503 n=24+24)
FlowSetup/vectorize=true/distribute=false-24    36.1kB ± 1%    36.0kB ± 0%  -0.22%  (p=0.009 n=25+25)

name                                          old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-24        361 ± 0%       358 ± 0%  -0.81%  (p=0.000 n=25+26)
FlowSetup/vectorize=true/distribute=false-24       348 ± 1%       346 ± 1%  -0.84%  (p=0.000 n=28+28)
```

70334: sql: allow atomic name swaps r=postamar a=postamar

This PR deprecates and disables usage of the draining names mechanism on catalog descriptors. When renaming or dropping a table/database/schema/type, the namespace entry is now modified in-transaction. This makes it possible to swap names in-transaction. On the other hand, names may be inconsistently resolved to either the old or the new target in specific circumstances.  

Fixes #54562.

71970: roachprod: support multiple local clusters r=RaduBerinde a=RaduBerinde

Many of these commits are cleanup and reorganization. The functional changes are:
- we no longer use ~/.roachprod/hosts; instead we use json files in ~/.roachprod/clusters
- we can now have multiple local clusters, with names like local-foo
- roachprod invocations using different GCE projects no longer step on each-other's toes 

When this PR merges, roachprod will lose track of any existing local cluster; it will need to be cleaned up and recreated.

#### roachprod: minor cleanup for cloud.Cloud

This change fills in some missing comments from `cloud.Cloud` and
improves the interface a bit. Some of the related roachprod code is
cleaned up as well.

Release note: None

#### roachprod: clean up local cluster metadata

The logic around how the local cluster metadata is loaded and saved is
very convoluted. The local provider is using `install.Clusters` and is
writing directly to the `.hosts/local` file.

This commit disentangles this logic: it is now up to the main program
to call `local.AddCluster()` to inject local cluster information. The
main program also provides the implementation for a new
`local.VMStorage` interface, allowing the code for saving the hosts
file to live where it belongs.

Release note: None

#### roachprod: clean up local cluster deletion

This change moves the code to destroy the local cluster to the local
provider. The hosts file is deleted through LocalVMStorage.

Release note: None

#### roachprod: rework clusters cache

This commit changes roachprod from using `hosts`-style files in
`~/.roachprod/hosts` for caching clusters to using json files in
`~/.roachprod/clusters`.

Like before, each cluster has its own file. The main advantage is
that we can now store the entire cluster metadata instead of
manufacturing it based on one-off parsing.

WARNING: after this change, the information in `~/.roachprod/hosts`
will no longer be used. If a local cluster exists, the new `roachprod`
version will not know about it. It is recommended to destroy any local
cluster before using the new version. A local cluster can also be
cleaned up manually using:
```
killall -9 cockroach
rm -rf ~/.roachprod/local
```

Release note: None

#### roachprod: use cloud.Cluster in SyncedCluster

This change stores Cluster metadata directly in SyncedCluster, instead
of making copies of various fields.

#### roachprod: store ports in vm.VM

This change adds `SQLPort` and `AdminUIPort` fields to `vm.VM`. This
allows us to remove the special hardcoded values for the local
cluster.

Having these fields stored in the clusters cache will allow having
multiple local clusters, each with their own set of ports.

Release note: None

#### roachprod: support multiple local clusters

This change adds support for multiple local clusters. Local cluster
names must be either "local" or of the form "local-foo".

When the cluster is named "local", the node directories stay in the
same place, e.g. `~/local/1`. If the cluster is named "local-foo",
node directories are like `~/local/foo-1`.

For local clusters we include the cluster name in the ROACHPROD
variable; this is necessary to distinguish between processes of
different local clusters. The relevant code is cleaned up to
centralize the logic related to the ROACHPROD variable.

Fixes #71945.

Release note: None

meh

#### roachprod: list VMs in parallel

This commit speeds up the slowest step of roachprod: listing VMs from
all providers. We now list the VMs in parallel across all providers
instead of doing it serially.

Release note: None

#### roachprod: fix behavior when mixing GCE projects

Currently roachprod has very poor behavior when used with different
projects on the same host. For example:
```
shell1: GCE_PROJECT=andrei-jepsen roachstress.sh ... // this will run ~forever
sometime later in shell2: roachprod sync (on the default project)
```

The sync on the default project removes the cached information for the
cluster on `andrei-jepsen`, which causes `roachprod` commands against
that cluster (from within the `roachstress.sh` script) to fail.

We fix this by ignoring any cached clusters that reference a project
that the provider was not configured for - both when loading clusters
into memory and when deleting stale cluster files during `sync`.

As part of the change, we also improve the output of `list` to remove
the colon after the cluster name and to include the GCE project:

```
$ roachprod list --gce-project cockroach-ephemeral,andrei-jepsen
Syncing...
Refreshing DNS entries...
glenn-anarest                  [aws]                      9  (142h41m39s)
glenn-drive                    [aws]                      1  (141h41m39s)
jane-1635868819-01-n1cpu4      [gce:cockroach-ephemeral]  1  (10h41m39s)
lin-ana                        [aws]                      9  (178h41m39s)
local-foo                      [local]                    4  (-)
radu-foo                       [gce:andrei-jepsen]        4  (12h41m39s)
radu-test                      [gce:cockroach-ephemeral]  4  (12h41m39s)
```

Release note: None

72071: sql: fix OIDs in RowDescription in some cases r=yuzefovich a=yuzefovich

Fixes: #71891.

Release note (bug fix): Previously, CockroachDB could not set the
`TableOID` and `TableAttributeNumber` attributes of `RowDescription`
message of pgwire protocol in some cases (these values would be left as
0), and this is now fixed.

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in 782ae3e Nov 4, 2021
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes A-schema-descriptors Relating to SQL table/db descriptor handling. A-schema-transactional C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants