Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add shed lease target to repl queue #94023

Merged
merged 1 commit into from
Jan 3, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Dec 21, 2022

Previously, a call to shedLease was made within the process loop and outside of planning, in the replicate queue. This patch moves the shed lease into consider rebalance, where it originally was and converts it into a plannable action.

Part of #90141

Release note: None

@kvoli kvoli self-assigned this Dec 21, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 221220.rq-lhtransfer branch from 6c76e28 to 15e82a3 Compare December 21, 2022 13:53
@kvoli kvoli marked this pull request as ready for review December 21, 2022 15:11
@kvoli kvoli requested a review from a team as a code owner December 21, 2022 15:11
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)

@kvoli
Copy link
Collaborator Author

kvoli commented Jan 3, 2023

TYFTR

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Jan 3, 2023

Build failed (retrying...):

@kvoli
Copy link
Collaborator Author

kvoli commented Jan 3, 2023

bors r-

@craig
Copy link
Contributor

craig bot commented Jan 3, 2023

Canceled.

@kvoli kvoli force-pushed the 221220.rq-lhtransfer branch from 15e82a3 to 18d462b Compare January 3, 2023 17:42
Previously, a call to `shedLease` was made within the process loop and
outside of planning, in the replicate queue. This patch moves the shed
lease into consider rebalance, where it originally was and converts it
into a plannable action.

Release note: None
@kvoli kvoli force-pushed the 221220.rq-lhtransfer branch from 18d462b to f2c3e12 Compare January 3, 2023 20:22
@kvoli
Copy link
Collaborator Author

kvoli commented Jan 3, 2023

bors r=andrewbaptist

TYFTR

@craig
Copy link
Contributor

craig bot commented Jan 3, 2023

Build succeeded:

@craig craig bot merged commit c533c2b into cockroachdb:master Jan 3, 2023
kvoli added a commit to kvoli/cockroach that referenced this pull request Aug 11, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Aug 29, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Sep 7, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Sep 11, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
craig bot pushed a commit that referenced this pull request Sep 11, 2023
108175: kvserver: unskip lease preferences during outage  r=andrewbaptist a=kvoli

Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After #85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After #94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with #85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: #88769
Release note: None

109432: cluster-ui: handle partial response errors on the database details page r=THardy98 a=THardy98

Part of: #102386

**Demos** (Note: these demos show this same logic applied to both the databases and database table pages as well):
DB-Console
- https://www.loom.com/share/5108dd655ad342f28323e72eaf68219c
- https://www.loom.com/share/1973383dacd7494a84e10bf39e5b85a3

This change applies the same error handling ideas from #109245 to the
database details page, enabling non-admin users to use the database
details page and providing better transparency to data fetching issues.

Errors encountered while fetching table details can be viewed via the
tooltip provided by the `Caution` icon at the table's name.
`unavailable` cells also provide a tooltip that displays the error
impacting that exact cell.

Release note (ui change): Non-admin users are able to use the database
details page.

110292: c2c: use seperate spanConfigEventStreamSpec in the span config event stream r=stevendanna a=msbutler

Previously, the spanConfigEventStream used a streamPartitionSpec, which contained a bunch of fields unecessary for span config streaming. This patch creates a new spanConfigEventStreamSpec which contains the fields only necessary for span config event streaming.

Informs #109059

Release note: None

110309: teamcity-trigger: ensure that `race` tag is only passed once r=healthy-pod a=healthy-pod

By running under `-race`, the go command defines the `race` build tag for us [1].

Previously, we defined it under `TAGS` to let the issue poster know that this is a failure under `race` and indicate that in the issue.

At the time, defining the tag twice didn't cause issues but after #109773, it led to build failures [2].

To reproduce locally:
```
bazel test -s --config=race pkg/util/ctxgroup:all --test_env=GOTRACEBACK=all --define gotags=bazel,gss,race
```

As a follow-up, we should find another way to let the issue poster know that a failure was running under `race`.

[1] https://go.dev/doc/articles/race_detector#Excluding_Tests
[2] #109994 (comment)

Epic: none
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants