Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decommission: improvements in 22.2 #85445

Closed
10 tasks done
AlexTalks opened this issue Aug 2, 2022 · 3 comments
Closed
10 tasks done

decommission: improvements in 22.2 #85445

AlexTalks opened this issue Aug 2, 2022 · 3 comments
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@AlexTalks
Copy link
Contributor

AlexTalks commented Aug 2, 2022

This is a tracking issue for decommissioning improvements in 22.2.

Benchmarking

  • Create benchmarks for decommissions in relevant cases
  • Evaluate benchmarks against optimal time estimates

Observability

  • Export queue lengths and snapshots in progress via metrics
  • Add tracing for errors during decommission
  • Track error counts for a decommission
  • Snapshot Dashboard showing: Snapshots in progress, queue lengths, purgatory counts, aggregate error counts.
  • Avoid displaying retired nodes in Admin UI

Operational Improvements

Jira issue: CRDB-18247

Epic CRDB-14621

@AlexTalks AlexTalks added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 2, 2022
@AlexTalks AlexTalks self-assigned this Aug 2, 2022
@AlexTalks AlexTalks added the A-kv Anything in KV that doesn't belong in a more specific category. label Aug 2, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 2, 2022
@kvoli
Copy link
Collaborator

kvoli commented Aug 3, 2022

related: #85528

craig bot pushed a commit that referenced this issue Aug 22, 2022
86190: backupccl: make more attributes of backup schedules alterable. r=benbardin a=benbardin

Release note (enterprise change): ALTER BACKUP now supports additional commands like
SET WITH, SET SCHEDULE OPTION, SET LABEL, and SET INTO

Release justification: low risk, medium benefit

Informs #84033

86252: ui, server: prevent decommissioned nodes from displaying in ui as live r=Santamaura a=Santamaura

This change resolves an issue where decommissioned nodes would
in rare cases display as live in the admin ui. The changes
are on both the frontend and backend:
- The ui has been changed so that the way live nodes are
filtered is now based on MembershipStatus which matches
the way decommissioned nodes are filtered.
- The server now uses GetLivenessesFromKV instead of
GetLivenesses because the latter retrieves livenesses
known to gossip which could be stale reads from in-memory cache
while the former is read directly from the KV layer.

Part of #85445

Release justification: low risk, high benefit changes to existing functionality.

Release note (ui change, api change): Filter live nodes based on
MembershipStatus and retrieve livenesses directly from kv.

86446: ui: style update on active execution r=maryliag a=maryliag

Previously we were not limiting the size of the
query column, making it hard to read with large values.
This commits limits the size and all a tooltip to
allow the user to see the full queyr if they want to.
This commit also adds a space at the end of the details
page.

Before
<img width="1585" alt="Screen Shot 2022-08-18 at 10 02 55 PM" src="https://user-images.githubusercontent.com/1017486/185622228-bff14c91-1d5d-44fc-b495-5f110001b712.png">

After
<img width="1564" alt="Screen Shot 2022-08-18 at 10 07 21 PM" src="https://user-images.githubusercontent.com/1017486/185622247-2c7cd090-9f6b-47b3-9efb-4c339364cfc0.png">

Release justification: low risk styling changes
Release note: None

86486: ui: remove next run from jobs that don't have next run r=maryliag a=maryliag

Several jobs don't have a future run, so the value
being displayed on the Jobs Details page would should
the last time it was scheduled.
This commit hides the next run field from the page
if is not in the future.

Fixes #86359

Release justification: low risk, high benefit change
Release note (ui change): Remove the Next Planned Execution
Time label, when the job doesn't have a next planned execution
scheduled.

86554: sql: attempt to deflake TestRaceWithIndexBackfillMerge r=stevendanna a=ajwerner

See [here](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_BazelUnitTests/6186518?buildTab=overview&showRootCauses=false&expandBuildProblemsSection=true&expandBuildTestsSection=true&expandBuildChangesSection=true&expandBuildDeploymentsSection=true#%2Fbazel-testlogs%2Fpkg%2Fsql%2Fsql_test%2Fshard_11_of_16)

Release justification: testing only change

Release note: None

Co-authored-by: Ben Bardin <[email protected]>
Co-authored-by: Santamaura <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@AlexTalks
Copy link
Contributor Author

Deferring to 23.1

Observability

  • Export traces to Observability Service

Operational Improvements

  • Preflight checks to fail fast if decommissioning cannot (currently) complete

@AlexTalks
Copy link
Contributor Author

All 22.2 improvements listed above have merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants