Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: update EXPERIMENTAL_RELOCATE to support relocation of non-voters #62696

Merged

Conversation

aayushshah15
Copy link
Contributor

This PR contains 3 commits:

sql: stop spuriously removing non-voters on EXPERIMENTAL_RELOCATE

Previously, a call to EXPERIMENTAL_RELOCATE.. (when relocating
replicas, not the lease) would incorrectly remove all non-voters on the
target range(s). This commit fixes the bug.

Release justification: bug fix
Release note: None

sql, parser: add EXPERIMENTAL_RELOCATE VOTERS as alias of
EXPERIMENTAL_RELOCATE

Release note (sql change): EXPERIMENTAL_RELOCATE VOTERS is now an
alias of EXPERIMENTAL_RELOCATE (not EXPERIMENTAL_RELOCATE LEASES).

sql, parser: add EXPERIMENTAL_RELOCATE NON_VOTERS

This commit adds a new variant of the EXPERIMENTAL_RELOCATE statement
which enables relocation of non-voting replicas via SQL.

Release note (sql change): EXPERIMENTAL_RELOCATE now has a new
variant: EXPERIMENTAL_RELOCATE NON_VOTERS which allows users to
relocate non-voting replicas via SQL.

Previously, a call to `EXPERIMENTAL_RELOCATE..` (when relocating
replicas, not the lease) would incorrectly remove all non-voters on the
target range(s). This commit fixes the bug.

Release justification: bug fix
Release note: None
@aayushshah15 aayushshah15 requested a review from a team as a code owner March 28, 2021 03:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15 aayushshah15 requested review from nvanbenschoten and removed request for a team March 28, 2021 03:46
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all :lgtm:. Is your intention to backport this to the v21.1 release branch?

Reviewed 1 of 1 files at r1, 5 of 5 files at r2, 15 of 15 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/sql/relocate.go, line 138 at r1 (raw file):

	} else {
		if err := params.p.ExecCfg().DB.AdminRelocateRange(
			params.ctx, rowKey, relocationTargets, rangeDesc.Replicas().NonVoters().ReplicationTargets(),

nit: consider pulling this into a variable like existingNonVoters := ... to make the intention more clear.

Same thing with existingVoters later on in this PR.


pkg/sql/parser/sql.y, line 1697 at r2 (raw file):

    }
  }
| ALTER TABLE table_name relocate_kw VOTERS select_stmt

Should we add a voters_kw target that includes VOTERS or | /* EMPTY */, to avoid needing two targets here and below?


pkg/sql/sem/tree/split.go, line 89 at r2 (raw file):

	}
	ctx.FormatNode(&node.TableOrIndex)
	ctx.WriteString(" EXPERIMENTAL_RELOCATE ")

nit: Should we keep this, since it's still common among all variants?

@aayushshah15
Copy link
Contributor Author

aayushshah15 commented Mar 29, 2021

Is your intention to backport this to the v21.1 release branch?

Anecdotally speaking, EXPERIMENTAL_RELOCATE seems to have come in handy in a bunch of support escalations in order to, say, quickly move a bunch of replicas off of a "bad" node for various reasons. Since the blast radius of this change is fairly small (and it's an experimental statement to begin with), I was thinking it'd be nice to have it in 21.1.

Do you have any concerns? I'll defer to you for the final call.

`EXPERIMENTAL_RELOCATE`

Release note (sql change): `EXPERIMENTAL_RELOCATE VOTERS` is now an
alias of `EXPERIMENTAL_RELOCATE`.
This commit adds a new variant of the `EXPERIMENTAL_RELOCATE` statement
which enables relocation of non-voting replicas via SQL.

Release note (sql change): `EXPERIMENTAL_RELOCATE` now has a new
variant: `EXPERIMENTAL_RELOCATE NON_VOTERS` which allows users to
relocate non-voting replicas via SQL.
@aayushshah15 aayushshah15 force-pushed the 20210327_updateExperimentalRelocate branch from 2769e6f to c49b853 Compare March 29, 2021 18:12
@aayushshah15
Copy link
Contributor Author

TFTR!

bors r+

@aayushshah15
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Mar 29, 2021

Canceled.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/sql/relocate.go, line 138 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider pulling this into a variable like existingNonVoters := ... to make the intention more clear.

Same thing with existingVoters later on in this PR.

Done.


pkg/sql/parser/sql.y, line 1697 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we add a voters_kw target that includes VOTERS or | /* EMPTY */, to avoid needing two targets here and below?

Done.


pkg/sql/sem/tree/split.go, line 89 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Should we keep this, since it's still common among all variants?

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the blast radius of this change is fairly small (and it's an experimental statement to begin with), I was thinking it'd be nice to have it in 21.1.

This all checks out for me, so I'm ok backporting it into the release. Thanks for getting it out so quickly.

Reviewed 15 of 15 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15)

@craig
Copy link
Contributor

craig bot commented Mar 29, 2021

Build succeeded:

@craig craig bot merged commit e07a1d4 into cockroachdb:master Mar 29, 2021
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 31, 2021
A preceeding change (cockroachdb#62696) introduced a flakey update to this test.
Prior to that change, this test was using 2 voting replicas but that
change tried to make it use 1 voter and 1 non-voter instead (as a litmus
test for the new syntax added in cockroachdb#62696).

The test rebalances a replica away from a node and ensures that a
historical read sent immediately afterwards gets re-routed to the
leaseholder replica, since the receiving store had its replica
destroyed. However, when we're using a non-voter in this test, that
non-voter may not have learned about this replication change by the time
it receives this historical query and that fails the assertion.

This commit re-organizes the test and fixes the flake.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 31, 2021
A preceeding change (cockroachdb#62696) introduced a flakey update to this test.
Prior to that change, this test was using 2 voting replicas but that
change tried to make it use 1 voter and 1 non-voter instead (as a litmus
test for the new syntax added in cockroachdb#62696).

The test rebalances a replica away from a node and ensures that a
historical read sent immediately afterwards gets re-routed to the
leaseholder replica, since the receiving store had its replica
destroyed. However, when we're using a non-voter in this test, that
non-voter may not have learned about this replication change by the time
it receives this historical query and that fails the assertion.

This commit re-organizes the test and fixes the flake.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 31, 2021
62498: utilccl,kvccl: improve performance when checking enterprise features r=tbg a=erikgrinaker

**utilccl: cache license decoding**

Previously, the `utilccl` package would decode the license from the the
base64-encoded Protobuf representation in settings every time it was
needed, which was sufficient for its uses. However, recently there's
been a need to check whether enterprise features are enabled in hot
paths (e.g. with follower reads as seen in #62447), making the decoding
cost too great.

This patch adds `cluster.Settings.Cache` as a shared cache, and uses it
to cache decoded licenses with a private key type.

**utilccl,kvccl: add IsEnterpriseEnabled for faster license checks**

`utilccl.CheckEnterpriseEnabled()` is used to check whether a valid
enterprise license exists for a given feature. If no valid license is
found, it returns an error with specific details.

However, `kvccl` used this function in follower read hot paths, and
instantiating an error when follower reads are unavailable could have
significant overhead -- see e.g. #62447.

This patch adds `IsEnterpriseEnabled()`, which has the same behavior as
`CheckEnterpriseEnabled()` but returns a boolean instead. This is
significantly faster since we can avoid instantiating a custom error
each time. `kvccl` is also updated to use this in hot paths.

Resolves #62489.

Release note: None

62642: colserde: fix the edge case with nulls handling r=yuzefovich a=yuzefovich

When serializing the data of Bool, Bytes, Int, and Float types when they
don't have any nulls in the vector, we don't explicit specify the null
bitmap. Previously, when deserializing such vectors with no nulls we
would simply call `UnsetNulls` on the `coldata.Nulls` object that is
currently present. However, it is possible that already present nulls
object cannot support the desired batch length. This could lead to index
out of bounds accesses. Note that in the vast majority of cases this
likely doesn't happen in practice because we check `MaybeHasNulls`, and
that would return `false` making us omit the null checking code.

Fixes: #62636.

Release note (bug fix): Previously, CockroachDB could encounter an
internal error in rare circumstances when executing queries via the
vectorized engine that operate on columns of BOOL, BYTES, INT, and FLOAT
types that have a mix of NULL and non-NULL values.

62740: workload: add idle-conns flag for adding idle connections to tpcc r=rafiss a=RichardJCai

workload: add idle-conns flag for adding idle connections to tpcc

Release note: None

#62526

62814: tenantrate: add "test" that reports IOPS estimations r=RaduBerinde a=RaduBerinde

This change adds a "test" facility which takes the description of a
uniform workload (read percentage, read size, write size) and prints
out an estimation of the sustained IOPS and burst IO. This will allow
a better understanding of how changes to the settings or the mechanism
translate into IOPS changes.

Release note: None

62833: kvserver: deflake TestFollowerReadsWithStaleDescriptor r=aayushshah15 a=aayushshah15

A preceding change (#62696) introduced a flakey update to this test.
Prior to that change, this test was using 2 voting replicas but that
change tried to make it use 1 voter and 1 non-voter instead (as a litmus
test for the new syntax added in #62696).

The test rebalances a replica away from a node and ensures that a
historical read sent immediately afterwards gets re-routed to the
leaseholder replica, since the receiving store had its replica
destroyed. However, when we're using a non-voter in this test, that
non-voter may not have learned about this replication change by the time
it receives this historical query and that fails the assertion.

This commit re-organizes the test and fixes the flake.

Release note: None

62862: testutils: add skip.UnderBazelWithIssue r=rickystewart a=stevendanna

This is to skip individual tests under bazel. This seems a bit more
fine-grained than the broken_in_bazel tag in the bazel configuration
but also allows us to avoid skipping tests that work outside of bazel
in our main test suite.

Release note: None

62877: Added CACHE to SEQUENCE syntax diagrams r=ericharmeling a=ericharmeling

Follow-up of #56954.

Release justification: non-production code changes

Release note: None

62889: colexecerror: catch panics from packages in sql/sem folder r=yuzefovich a=yuzefovich

Previously, we would only catch panics from `sql/sem/tree` package.
Recently sqlsmith encountered a crash because of a panic in
`sql/sem/builtins` package, and I believe it is reasonable to catch
panics from that package as well as from `sql/sem/transform`, so we will
now be catching based on `sql/sem` prefix.

Addresses: #62846.

Release note: None

62898: build: install essential build tools in teamcity build agents r=jlinder a=rickystewart

In #62815, we migrated from an alternative way of installing golang, the
`longsleep/golang-backports` deb repo, to the currently recommended
install method found at https://golang.org/doc/install -- namely, we
download a tarball and then just unzip it in the right spot. This
works perfectly, *except* that the deb package had a dependency on build
tools like `gcc` and `make`, and certain build configurations had come
to depend on their global installation (namely, all those that don't use
`builder.sh` to run a build). This resulted in a couple of failures
being reported:

* https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_ExampleORMs/2834741
* https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_Acceptance/2834732

We just install [`build-essential`](https://packages.ubuntu.com/xenial/build-essential)
here, which is the easiest way to get all of that stuff.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Aayush Shah <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 31, 2021
A preceeding change (cockroachdb#62696) introduced a flakey update to this test.
Prior to that change, this test was using 2 voting replicas but that
change tried to make it use 1 voter and 1 non-voter instead (as a litmus
test for the new syntax added in cockroachdb#62696).

The test rebalances a replica away from a node and ensures that a
historical read sent immediately afterwards gets re-routed to the
leaseholder replica, since the receiving store had its replica
destroyed. However, when we're using a non-voter in this test, that
non-voter may not have learned about this replication change by the time
it receives this historical query and that fails the assertion.

This commit re-organizes the test and fixes the flake.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants