Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: sep-raft-log: verify ReplicaID consistency at boot time #93310

Open
tbg opened this issue Dec 9, 2022 · 2 comments
Open

kvserver: sep-raft-log: verify ReplicaID consistency at boot time #93310

tbg opened this issue Dec 9, 2022 · 2 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Dec 9, 2022

We need to check that all replicas (incl. uninitialized ones) have a persisted ReplicaID and that it matches that in the descriptor (if initialized).

See #93244; that check will live in a similar location.

Epic: CRDB-220

Jira issue: CRDB-22267

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Dec 9, 2022
@blathers-crl
Copy link

blathers-crl bot commented Dec 9, 2022

cc @cockroachdb/replication

tbg added a commit to tbg/cockroach that referenced this issue Dec 9, 2022
These are the beginnings of cockroachdb#93310 and cockroachdb#93244.
We need to inspect and possibly reconcile the persisted state at startup
time. This PR refactors what we have into a shape where new code can be
added more easily.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Dec 16, 2022
These are the beginnings of cockroachdb#93310 and cockroachdb#93244.
We need to inspect and possibly reconcile the persisted state at startup
time. This PR refactors what we have into a shape where new code can be
added more easily.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Dec 19, 2022
These are the beginnings of cockroachdb#93310 and cockroachdb#93244.
We need to inspect and possibly reconcile the persisted state at startup
time. This PR refactors what we have into a shape where new code can be
added more easily.

Release note: None
craig bot pushed a commit that referenced this issue Dec 19, 2022
93311: kvserver: introduce loadAndReconcileReplicas r=pavelkalinnikov a=tbg

These are the beginnings of #93310 and #93244.
We need to inspect and possibly reconcile the persisted state at startup
time. This PR refactors what we have into a shape where new code can be
added more easily.

Epic: CRDB-220
Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
@tbg
Copy link
Member Author

tbg commented Dec 22, 2022

There can be uninitialized replicas that were created before we introduced RaftReplicaID. We can "just" remove these when we encounter them.

tbg added a commit to tbg/cockroach that referenced this issue Jan 18, 2023
I'm working on a datadriven test for `kvstorage` and I need to be able
to initialize the engine there.

There's more that should move, but I'm taking it one step at a time.
As we build up more and more sophisticated datadriven tests in
`kvstorage` we can move whatever we need when we need it.

PS: I looked at the unit test being modified by the move and it can't
yet be moved to kvstorage - it also bootstraps the initial ranges, etc,
requiring a lot more code movement until we can move it.

Touches cockroachdb#93310.

Epic: CRDB-220
Release note: None
craig bot pushed a commit that referenced this issue Jan 18, 2023
94670: metrics: add tenant name to _status/vars output r=knz a=dhartunian

This commit adds a `tenant` prometheus label to each metrics output via the
`_status/vars` HTTP endpoint. The label is populated with either "system" or
the name of the tenant generating the metric. When queried from the system
tenant's HTTP port or handler, the result will now also contain metrics from
all in-process tenants on the same node.

When initializing a tenant, an optional "parent" metrics recorder is passed
down allowing the tenant's recorder to be registered with the parent. When we
query the parent it iterates over all child recorders (much like we already do
for stores) and outputs all of their metrics as well.

Example:
```
sql_txn_commit_count{tenant="system"} 0
sql_txn_commit_count{tenant="demo-tenant"} 0
```

Resolves: #94663
Epic: CRDB-18798

Release note (ops change): Metrics output via `_status/vars` now contain
`tenant` labels allowing the user to distinguish between metrics emitted by
the system tenant vs other app tenants identified by their IDs.

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>

95234: sql: evaluate correlated subqueries as routines r=mgartner a=mgartner

#### opt: create tree.Routine planning closure in helper function

The logic for creating a planning closure for a `tree.Routine` has been
moved to a helper function so that it can be reused in future commits.

Release note: None

#### sql: evaluate correlated subqueries as routines

Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes #71908
Fixes #73573
Fixes #80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.


95432: kvserver,kvstorage: move InitEngine r=pavelkalinnikov a=tbg

I'm working on a datadriven test for `kvstorage` and I need to be able
to initialize the engine there.

There's more that should move, but I'm taking it one step at a time.
As we build up more and more sophisticated datadriven tests in
`kvstorage` we can move whatever we need when we need it.

PS: I looked at the unit test being modified by the move and it can't
yet be moved to kvstorage - it also bootstraps the initial ranges, etc,
requiring a lot more code movement until we can move it.

100% mechanical movement via Goland.

Touches #93310.

Epic: CRDB-220
Release note: None


Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue Jan 20, 2023
Same rationale as cockroachdb#95432 - they belong in `kvstorage` and I need
them there now as I'm working on a datadriven test.

Touches cockroachdb#93310.

Epic: CRDB-220
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 23, 2023
See cockroachdb#93310. This is
also the beginning of
cockroachdb#93247.

Epic: CRDB-220
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Feb 2, 2023
Same rationale as cockroachdb#95432 - they belong in `kvstorage` and I need
them there now as I'm working on a datadriven test.

Touches cockroachdb#93310.

Epic: CRDB-220
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Feb 2, 2023
Same rationale as cockroachdb#95432 - they belong in `kvstorage` and I need
them there now as I'm working on a datadriven test.

Touches cockroachdb#93310.

Epic: CRDB-220
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Feb 2, 2023
See cockroachdb#93310. This is
also the beginning of
cockroachdb#93247.

Epic: CRDB-220
Release note: None
craig bot pushed a commit that referenced this issue Feb 3, 2023
95710: sql: support `*` in udf bodies r=rharding6373 a=rharding6373

This change allows `*` usage in UDF bodies. We rewrite UDF ASTs in place
to expand `*`s into the columns they reference.

Informs: #90080

Epic: CRDB-19496
Release note (sql change): Allow `*` expressions in UDFs.

96029: pkg/util/metric: option to use legacy hdrhistogram model, increase bucket fidelity r=tbg,aadityasondhi a=dhartunian

This patch reeintroduces the old HdrHistogram model to optionally be
enabled in favor of the new Prometheus model, gated behind
an environment variable called `COCKROACH_ENABLE_HDR_HISTOGRAMS`,
allowing users a means to "fall back" to the old model in the
event that the new model does not adequately serve their needs
(think of this as an "insurance policy" to protect against
this from happening again with no real mitigation - ideally,
this environment variable should never have to be used).

It also updates the pre-defined bucket boundaries used by the Prometheus
backed histograms with more buckets. This aims to improve precision,
especially for latency histograms, when calculating quantiles (the low precision
being the core cause of the issue at hand).

Note: some histograms were introduced *after* the new
Prometheus histograms were added to CockroachDB. In this
case, we use the `ForceUsePrometheus` option in the
`HistogramOptions` struct to ignore the value of the env
var, since there never was a time where these specific
histograms used the HdrHistogram model.

Release note (ops change): Histogram metrics can now optionally
use the legacy HdrHistogram model by setting the environment var
`COCKROACH_ENABLE_HDR_HISTOGRAMS=true` on CockroachDB nodes.
**Note that this is not recommended** unless users are having
difficulties with the newer Prometheus-backed histogram model.
Enabling can cause performance issues with timeseries databases
like Prometheus, as processing and storing the increased number
of buckets is taxing on both CPU and storage. Note that the
HdrHistogram model is slated for full deprecation in upcoming
releases.

Fixes: #95833

96269: sql/physicalplan: allow passing replica oracle to planning r=dt a=dt

This PR pulls out a subset of the changes in #93810, namely allowing a replica-choice oracle to be passed on a per-plan basis, but without changing which oracle is actually used in any existing planning, and only introducing the API for future users. 

This refactors span resolver and the physical planning methods to allow passing a replica-choice oracle per planning instead of using a fixed oracle (either bin-packing or closest depending on server type). The existing API that uses the default oracle is kept as is for all existing callers, while a new WithOracle variant is added for future callers who wish to pass their own oracle.

An new oracle that prefers followers over leaseholders is also added here, however it is not used at this time.

Release note: none.
Epic: none.

96349: logictest: skip another flaky test in ranges r=mgartner a=mgartner

This commit skips a flaky test that I missed in #96271.

Informs #96136

Epic: None

Release note: None

96433: storage: add string repr for FullReplicaID r=pavelkalinnikov a=tbg

Helpful for subsequent unit testing involving this type.

NB: this type is in the wrong place. Now's not the time
to move it.

Epic: none
Release note: None


96434: kvserver,kvstorage: move read/write methods for cluster versions r=pavelkalinnikov a=tbg

Same rationale as #95432 - they belong in `kvstorage` and I need
them there now as I'm working on a datadriven test.

Touches #93310.

Epic: CRDB-220
Release note: None


96445: backupccl: add 'aws-weekly' tag to restore/tpce/32tb test r=healthy-pod,renatolabs a=msbutler

Epic: none

Release note: None

Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue Feb 3, 2023
See cockroachdb#93310. This is
also the beginning of
cockroachdb#93247.

Epic: CRDB-220
Release note: None
@exalate-issue-sync exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

1 participant