Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tablet throttler: multi-metric support #15988

Merged
merged 221 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
221 commits
Select commit Hold shift + click to select a range
b40945f
CheckThrottlerResponse: introduce Metric map
shlomi-noach Mar 31, 2024
f100aea
Introduce MetricResult and CheckResult.Metrics. Populate CheckThrottl…
shlomi-noach Mar 31, 2024
3fa8b23
Tablet throttler: wip support for multi-metrics
shlomi-noach Apr 4, 2024
187963c
validate 'default' metric assumes 'custom' value when custom query is…
shlomi-noach Apr 4, 2024
70726f8
support for v19 replicas
shlomi-noach Apr 9, 2024
5f7b978
compatibility: replica reporting to a v19 primary
shlomi-noach Apr 10, 2024
106efab
remove race condition in tests
shlomi-noach Apr 11, 2024
bc3ae15
add 'serialFuncChan' to remove race condition in tests
shlomi-noach Apr 11, 2024
94d0f93
more idiomatic waiting code
shlomi-noach Apr 11, 2024
eb84e80
more information in ThrottlerStatus
shlomi-noach Apr 11, 2024
2f61a42
validate metric result threshold value
shlomi-noach Apr 11, 2024
2aa3647
resolved conflict
shlomi-noach Apr 11, 2024
98df99b
throttler client: use metric names. Cache throttling result based on …
shlomi-noach Apr 16, 2024
0d49400
Skip sleeping in ThrottleCheckOKOrWaitAppName if context expires
shlomi-noach Apr 17, 2024
f749a39
can remove check for ctx.Err() given that ThrottleCheckOKOrWaitAppNam…
shlomi-noach Apr 17, 2024
e8b9df8
remove remoteAddr. Wasn't really used
shlomi-noach Apr 17, 2024
848fcf7
Client does not provide metric names. Throttler holds a map app->metr…
shlomi-noach Apr 17, 2024
fb27a3a
remove debugging info
shlomi-noach May 1, 2024
c6b164c
allow grace period in tests
shlomi-noach May 1, 2024
aca248c
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 1, 2024
afe407a
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 6, 2024
e284d2d
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 7, 2024
d783bca
remove remoteAddr
shlomi-noach May 7, 2024
ac1982e
use appCheckedMetrics to get metric names if unspecified
shlomi-noach May 7, 2024
294b39f
test use of appCheckedMetrics, of explicit metric names, and of lack …
shlomi-noach May 7, 2024
073e6f4
client: fail if any metric is not OK
shlomi-noach May 7, 2024
a7e0e68
validate new client behavior: any single non-OK metric returns with o…
shlomi-noach May 7, 2024
4946477
isolate behavior t ostrictly backwards compatible required situation …
shlomi-noach May 7, 2024
0c34e16
resolved conflict
shlomi-noach May 9, 2024
dec1881
use MetricNames
shlomi-noach May 9, 2024
044b818
Introduce MetricNames and AppMetrics in ThrottlerConfig
shlomi-noach May 9, 2024
a7901e3
change name
shlomi-noach May 9, 2024
b6c5367
apply throttlerConfig.AppCheckedMetrics onto throttler.appCheckedMetr…
shlomi-noach May 9, 2024
becd655
simplify test
shlomi-noach May 9, 2024
bff2ed8
Introduce 'all' app
shlomi-noach May 9, 2024
f9f5f8e
Merge remote-tracking branch 'origin/main' into throttler-multi-metrics
shlomi-noach May 12, 2024
8548c88
vtctl proto: add CheckThrottler, CheckThrottlerRequest, CheckThrottle…
shlomi-noach May 12, 2024
81ddd5c
vtctl proto: add CheckThrottler, CheckThrottlerRequest, CheckThrottle…
shlomi-noach May 12, 2024
84bdc68
add tablet alias
shlomi-noach May 12, 2024
bb9713c
implement vtctldclient CheckThrottler command
shlomi-noach May 12, 2024
1d96ae1
add low_priority, skip_request_heartbeats, ok_if_not_exists to CheckR…
shlomi-noach May 12, 2024
747d151
add low_priority, skip_request_heartbeats, ok_if_not_exists to CheckR…
shlomi-noach May 12, 2024
65f5529
move CheckType into flags
shlomi-noach May 12, 2024
1d37b05
typo
shlomi-noach May 12, 2024
9e2ae21
get rid of CheckType (self/primary-write): it's essentially a map for…
shlomi-noach May 13, 2024
e31047d
rename Store as Scope
shlomi-noach May 13, 2024
4763bec
Shard scope metrics include primary's own metrics
shlomi-noach May 13, 2024
7e514b7
add scope in CheckThrottlerRequest
shlomi-noach May 13, 2024
4e1012a
support Scope in CheckThrottlerRequest
shlomi-noach May 13, 2024
9f237f6
populate scope
shlomi-noach May 13, 2024
0b0811f
populate --app-name and --scope flags
shlomi-noach May 13, 2024
6827f4a
metric names have default scopes; a check can explciitly indicate a s…
shlomi-noach May 13, 2024
803c73a
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 13, 2024
e370ca5
resolved conflict
shlomi-noach May 15, 2024
869943a
Client: use base.UndefinedScope
shlomi-noach May 15, 2024
4185c53
when we note that a non-low-priority app was throttled, we now do not…
shlomi-noach May 15, 2024
c299b42
metricNames may contain scope, which then overrides their default one…
shlomi-noach May 15, 2024
9835f9c
fix test code comments
shlomi-noach May 15, 2024
fa28d9c
Add 'Scope' in CheckThrottlerResponse
shlomi-noach May 15, 2024
7a6fe75
Add 'Scope' in CheckThrottlerResponse
shlomi-noach May 15, 2024
7de790f
populate CheckThrottlerResponse_Metric's Scope field with actual scop…
shlomi-noach May 15, 2024
ef48a2d
fix tests
shlomi-noach May 15, 2024
97e07ea
validate scope value in CheckThrottler
shlomi-noach May 15, 2024
26b8825
test exempt all
shlomi-noach May 15, 2024
347dd56
when unthrottling app via vtctldclient --unthrottle-app, the app entr…
shlomi-noach May 15, 2024
d1921e4
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 16, 2024
880c14a
remove commented TODO code
shlomi-noach May 16, 2024
30612fe
Adding MetricThresholds in ThrottlerConfig, adding MetricName in Upda…
shlomi-noach May 16, 2024
d3142ad
Read --metric-name into updateThrottlerConfigOptions.MetricName
shlomi-noach May 16, 2024
c802f7b
when --metric-name is given, apply threshold to that metric in the th…
shlomi-noach May 16, 2024
28e1120
applyThrottlerConfig: apply metric-specific threshold, override inven…
shlomi-noach May 16, 2024
b41efa2
validate: --metric-name requires --threshold
shlomi-noach May 16, 2024
b793ecc
Add AppName, AppMetrics in UpdateThrottlerConfigRequest proto
shlomi-noach May 16, 2024
b01d081
rename AppMetrics to AppCheckedMetrics
shlomi-noach May 16, 2024
a506ffa
validate --app-name in conjunction with --app-metrics. Validate apps …
shlomi-noach May 16, 2024
0ffd206
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 16, 2024
159b573
fix error message on unknown metric name
shlomi-noach May 16, 2024
17fa059
MetricNames.String()
shlomi-noach May 16, 2024
cb16a8b
remove app checked metrics entry if list is empty. Report app checked…
shlomi-noach May 16, 2024
93825b0
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 19, 2024
ebaf173
tests: UpdateThrottlerTopoConfigRaw and UpdateThrottlerTopoConfigRaw …
shlomi-noach May 19, 2024
a3dc70b
CheckThrottler does not specify any metrics
shlomi-noach May 19, 2024
d037924
on error of any metric, also report the metric's value and threshold
shlomi-noach May 19, 2024
d95ec66
do not override values upon error
shlomi-noach May 19, 2024
f718dd8
Unit tests: assign metric names to app. Check self and shard scopes, …
shlomi-noach May 19, 2024
1224a92
adapt tests
shlomi-noach May 19, 2024
64ef430
endtoend: UpdateThrottlerTopoConfig[Raw] accepts appCheckedMetrics
shlomi-noach May 19, 2024
5003b0c
vitess app always checks all known metrics (this is how a throttler p…
shlomi-noach May 19, 2024
b3a8e2f
adapt unit tests
shlomi-noach May 19, 2024
88f7431
endtoend: TestUpdateAppCheckedMetrics
shlomi-noach May 19, 2024
0e02a22
more tests for vitess app
shlomi-noach May 19, 2024
24239f2
Assigning metrics to 'all' app applies those metrics to all apps who …
shlomi-noach May 19, 2024
2746661
fix metric threshold iteration
shlomi-noach May 20, 2024
0509de4
TestApplyThrottlerConfigAppCheckedMetrics: reuse ThrottlerConfig
shlomi-noach May 20, 2024
cbbdfd4
proto: Adding MultiMetricsEnabled to CheckThrottlerRequest. This flag…
shlomi-noach May 20, 2024
c47bc5f
populate MultiMetricsEnabled
shlomi-noach May 20, 2024
c9f0a24
populate MultiMetricsEnabled
shlomi-noach May 20, 2024
ef0650c
only propagate metric errors to the general CheckResult if the origin…
shlomi-noach May 20, 2024
6b7accc
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 20, 2024
dbf3b93
formalize default metric thresholds
shlomi-noach May 20, 2024
cc5c31b
adapt debug/metrics to multimetrics.
shlomi-noach May 20, 2024
1e0d70b
UpdateThrottlerConfig: remove AppCheckedMetrics entry from the map if…
shlomi-noach May 20, 2024
e45ac0c
ThrottlerStatusRequest and ThrottlerStatusResponse
shlomi-noach May 21, 2024
7d12a94
rpc ThrottlerStatus
shlomi-noach May 21, 2024
c3b90e8
ThrottlerStatusRequest, ThrottlerStatusResponse, and ThrottlerStatus …
shlomi-noach May 21, 2024
fd6ef65
ThrottlerStatusRequest, ThrottlerStatusResponse, and ThrottlerStatus …
shlomi-noach May 21, 2024
48b08fa
rename field as TabletAlias
shlomi-noach May 21, 2024
4e58ada
ThrottlerStatus->GetThrottlerStatus
shlomi-noach May 21, 2024
17b1a2e
ThrottlerStatus(Request|Response)->GetThrottlerStatus(Request|Response)
shlomi-noach May 21, 2024
da6ecf5
int32->int64
shlomi-noach May 21, 2024
841d2a6
implement GetThrottlerStatus
shlomi-noach May 21, 2024
bae3b90
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 21, 2024
39171b4
isDormat() fix when throttler is disabled
shlomi-noach May 21, 2024
cc8806f
tablet server: /check's app name default 'vitess'
shlomi-noach May 21, 2024
663dcd3
vtctldclient cli commands help/test
shlomi-noach May 22, 2024
b0f08de
MultiMetricsEnabled in tabletserver api /throttler/check*
shlomi-noach May 22, 2024
a46e6ce
do not apply zero threshold value in applyThrottlerConfig() unless cu…
shlomi-noach May 22, 2024
d08a04c
apply default metric if available (v19 backwards compatibility)
shlomi-noach May 22, 2024
b5eca7c
endtoend: support GetThrottlerStatus. Keep default lag threshold (5s)…
shlomi-noach May 22, 2024
07611ab
even more throttler threshold in OnlineDDL tests
shlomi-noach May 22, 2024
1cee554
undo extending threshold
shlomi-noach May 22, 2024
f799e8c
Adding RecentlyChecked in throttler status
shlomi-noach May 22, 2024
1ce84f5
correctly populate RecentlyChecked in check result. Reuse in throttle…
shlomi-noach May 22, 2024
af7d347
either options to set checkResult.RecentlyChecked
shlomi-noach May 22, 2024
e922d3c
WaitForCheckThrottlerResult
shlomi-noach May 22, 2024
d825c1f
parameterize recentCheckDiff. Add testing
shlomi-noach May 23, 2024
a63cdef
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 23, 2024
ed8a74b
MultiMetricsEnabled in throttler.Client
shlomi-noach May 23, 2024
d94de0f
endtoend: WaitForThrottlerStatusEnabled() uses vtctldclient GetThrott…
shlomi-noach May 23, 2024
80f0f37
WaitForThrottlerStatusEnabled: when enabled, wait for metric health t…
shlomi-noach May 23, 2024
d75c404
RecentApps in GetThrottlerStatusResponse
shlomi-noach May 23, 2024
269c69d
formalize RecentApp, report in throttler status
shlomi-noach May 23, 2024
9ba160d
mark status code
shlomi-noach May 23, 2024
8bae917
fix bug: missing negation. Vitess app should not affect non-low-prior…
shlomi-noach May 23, 2024
616154e
better messaging
shlomi-noach May 23, 2024
c940e0c
remove LowPriority altogether from throttler check flags and proto
shlomi-noach May 23, 2024
4de5550
remove debug message
shlomi-noach May 23, 2024
4e15dd3
return
shlomi-noach May 23, 2024
27514e1
increase context timeout beyond lock wait timeout
shlomi-noach May 23, 2024
9ae7caa
onlineddl_scheduler / force_cutover test: extend timeout, inject hear…
shlomi-noach May 23, 2024
1a09d25
more time for force_cutover test
shlomi-noach May 23, 2024
14af07d
throttler debug info
shlomi-noach May 25, 2024
d72b1e3
fail after debug message
shlomi-noach May 25, 2024
76d3f38
set --migration_check_interval
shlomi-noach May 26, 2024
a499d7a
wait for 'ready_to_complete' to avoid race condition with vreplication
shlomi-noach May 26, 2024
5b1561d
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 26, 2024
14e5ee8
comments
shlomi-noach May 26, 2024
e4aa405
Merge branch 'main' into throttler-multi-metrics
shlomi-noach May 29, 2024
0633dd8
resolved conflicts
shlomi-noach Jun 2, 2024
de5443b
resolved conflicts
shlomi-noach Jun 2, 2024
c6ce707
similarly to 347dd56692bff4b11d27a2cd8345f0423d4a7ebb, when unthrottl…
shlomi-noach Jun 2, 2024
9b40469
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jun 3, 2024
978354d
resolved conflicts
shlomi-noach Jun 4, 2024
8f96886
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jun 5, 2024
318da0c
fix method argument
shlomi-noach Jun 5, 2024
9a5cfec
golang 1.22.4
shlomi-noach Jun 5, 2024
da56591
backwards compatibility: fallback to /throttler/status for v20 tablets
shlomi-noach Jun 5, 2024
bb60e89
fixed message
shlomi-noach Jun 5, 2024
03238e0
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jun 13, 2024
2d981cb
throttle and unthrottle via query, not via '/throttler/throttle-app'
shlomi-noach Jun 13, 2024
aeb5b65
validate online-ddl isn't throttled before we begin
shlomi-noach Jun 13, 2024
f98fb83
use vtctldclient
shlomi-noach Jun 13, 2024
8fea8bf
quote
shlomi-noach Jun 13, 2024
b01fb81
check Keyspace configuration
shlomi-noach Jun 13, 2024
3c838a0
revert to vtgate-based query; print throttler status on error
shlomi-noach Jun 13, 2024
1c76866
another attempt at printing status
shlomi-noach Jun 13, 2024
8e9d0aa
temporarily disabling n-1/n test
shlomi-noach Jun 16, 2024
4c00b91
resolved conflicts
shlomi-noach Jun 16, 2024
231efcc
restoring n-1/n tests; printing SHA of last & next releases
shlomi-noach Jun 16, 2024
e1364b0
indicate this SHA
shlomi-noach Jun 16, 2024
28dd1d2
For backwards compatibility, unthrottling an app will nto remove the …
shlomi-noach Jun 16, 2024
997b7d6
resolved conflict
shlomi-noach Jun 17, 2024
71e8a59
fmt.Printf -> t.Logf
shlomi-noach Jun 19, 2024
ac556d8
assert.Len
shlomi-noach Jun 19, 2024
36b1f53
MetricNames.Unique()
shlomi-noach Jun 19, 2024
ca44839
test comments
shlomi-noach Jun 19, 2024
6853576
pass request
shlomi-noach Jun 19, 2024
c617c75
Removed empty check_test.go
shlomi-noach Jun 19, 2024
3cbfb9e
preallocate map size
shlomi-noach Jun 19, 2024
c8bf456
simplify Split()
shlomi-noach Jun 19, 2024
3057dc4
remove debugging info
shlomi-noach Jun 19, 2024
b4d9f6c
preallocate map
shlomi-noach Jun 19, 2024
615781a
preallocate map
shlomi-noach Jun 19, 2024
3e71054
simplify MetricNames.Contains()
shlomi-noach Jun 19, 2024
9f0e712
typo
shlomi-noach Jun 19, 2024
199ca16
typo
shlomi-noach Jun 19, 2024
b845f2c
clarify comment
shlomi-noach Jun 19, 2024
2ab1924
wording
shlomi-noach Jun 19, 2024
39abd35
else if
shlomi-noach Jun 19, 2024
ee4f831
simplify stats variables
shlomi-noach Jun 19, 2024
36bba11
assert.Fail
shlomi-noach Jun 19, 2024
1d6f0bc
use throttlerapp.TestingName
shlomi-noach Jun 19, 2024
029c443
v19 -> v20
shlomi-noach Jun 19, 2024
d0972d3
assert.Fail
shlomi-noach Jun 19, 2024
2880182
v19 -> v20
shlomi-noach Jun 19, 2024
9ac893e
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jun 19, 2024
2499169
v19 -> v20
shlomi-noach Jun 19, 2024
2f587ce
formalizing the use of throttlerapp.TestingName
shlomi-noach Jun 19, 2024
886a6d0
vtctldclient UpdateThrottlerConfig: better validation via Cobra
shlomi-noach Jun 19, 2024
56bae93
remvoe manual mutual exclusivity test
shlomi-noach Jun 19, 2024
ca3dc8e
CheckThrottler command: --request-heartbeats (default 'false') replac…
shlomi-noach Jun 19, 2024
9624a29
fixing goroutine leak
shlomi-noach Jun 19, 2024
d487993
ensure to wait as very last step
shlomi-noach Jun 19, 2024
8e798aa
use LeakCheckContext
shlomi-noach Jun 19, 2024
ff365e9
ignore Rates goroutine leak
shlomi-noach Jun 19, 2024
2ba62c8
fix goroutine leak
shlomi-noach Jun 19, 2024
8e8271c
fix leak
shlomi-noach Jun 19, 2024
63dd98e
revert to normal context
shlomi-noach Jun 20, 2024
77aa2d9
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jun 23, 2024
7025dbf
consolidate vtctldata.GetThrottlerStatusResponse with tabletmanagerda…
shlomi-noach Jun 23, 2024
18141ac
consolidate vtctldata.CheckThrottlerResponse with tabletmanagerdata.C…
shlomi-noach Jun 23, 2024
6d8091e
resolved conflict
shlomi-noach Jul 3, 2024
c9bde7d
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jul 4, 2024
cd04223
fix function comment
shlomi-noach Jul 10, 2024
2b8f780
Merge branch 'main' into throttler-multi-metrics
shlomi-noach Jul 10, 2024
ef5ea1f
fix function comment
shlomi-noach Jul 10, 2024
f9fc8df
report ThrottlerProbesError
shlomi-noach Jul 10, 2024
74b7fdc
small rewrite per review
shlomi-noach Jul 10, 2024
f901615
typo
shlomi-noach Jul 10, 2024
8d05c97
remove ThrottlerRecentlyChecked metric
shlomi-noach Jul 10, 2024
8d69d4e
release notes
shlomi-noach Jul 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/upgrade_downgrade_test_onlineddl_flow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ jobs:
if: steps.skip-workflow.outputs.skip-workflow == 'false' && steps.changes.outputs.end_to_end == 'true'
timeout-minutes: 10
run: |
echo "building last release: $(git rev-parse HEAD)"
source build.env
make build
mkdir -p /tmp/vitess-build-last/
Expand All @@ -162,6 +163,7 @@ jobs:
if: steps.skip-workflow.outputs.skip-workflow == 'false' && steps.changes.outputs.end_to_end == 'true'
timeout-minutes: 10
run: |
echo "building next release: $(git rev-parse HEAD)"
source build.env
NOVTADMINBUILD=1 make build
mkdir -p /tmp/vitess-build-next/
Expand All @@ -182,6 +184,7 @@ jobs:
if: steps.skip-workflow.outputs.skip-workflow == 'false' && steps.changes.outputs.end_to_end == 'true'
timeout-minutes: 10
run: |
echo "building this SHA: $(git rev-parse HEAD)"
source build.env
make build
mkdir -p /tmp/vitess-build-current/
Expand Down
22 changes: 21 additions & 1 deletion changelog/21.0/21.0.0/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- [VTTablet Flags](#vttablet-flags)
- **[Traffic Mirroring](#traffic-mirroring)**
- **[New VTGate Shutdown Behavior](#new-vtgate-shutdown-behavior)**
- **[Tablet Throttler: Multi-Metric support](#tablet-throttler)**

## <a id="major-changes"/>Major Changes

Expand Down Expand Up @@ -60,4 +61,23 @@ without getting a `Server shutdown in progress` error.

This new behavior can be enabled by specifying the new `--mysql-server-drain-onterm` flag to VTGate.

See more information about this change by [reading its RFC](https://github.com/vitessio/vitess/issues/15971).
See more information about this change by [reading its RFC](https://github.com/vitessio/vitess/issues/15971).

### <a id="tablet-throttler"/>Tablet Throttler: Multi-Metric support

Up till `v20`, the tablet throttler would only monitor and use a single metric. That would be replication lag, by default, or could be the result of a custom query. `v21` introduces a major redesign where the throttler monitors and uses multiple metrics at the same time, including the above two.

Backwards compatible with `v20`, the default behavior in `v21` is to monitor all metrics, but only use `lag` (if the cutsom query is undefined) or the `cutsom` metric (if the custom query is defined). A `v20` `PRIMARY` is compatible with a `v21` `REPLICA`, and a `v21` `PRIMARY` is compatible with a `v20` `REPLICA`.

However, with `v21` it is possible to assign any combination of metrics (one or more) for a given app. The throttler would then accept or reject the app's requests based on the health of _all_ assigned metrics. `v21` comes with a preset list metrics, expected to be expanded:

- `lag`: replication lag based on heartbeat injection.
- `threads_running`: concurrent active threads on the MySQL server.
- `loadavg`: per core load average measured on the tablet instance/pod.
- `custom`: the result of a custom query executed on the MySQL server.

Each metric has a factory threshold which can be overridden by the `UpdateThrottlerConfig` command.

The throttler also supports the catch-all `"all"` app name, and it is thus possible to assign metrics to _all_ apps. Explicit app to metric assignments will override the catch-all configuration.

Metrics are assigned a default _scope_, which could be `self` (isolated to the tablet) or `shard` (max, aka _worst_ value among shard tablets). It is further possible to require a different scope for each metric.
3 changes: 2 additions & 1 deletion go/cmd/vtctldclient/command/onlineddl.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import (

topodatapb "vitess.io/vitess/go/vt/proto/topodata"
vtctldatapb "vitess.io/vitess/go/vt/proto/vtctldata"
"vitess.io/vitess/go/vt/proto/vttime"
)

const (
Expand Down Expand Up @@ -307,7 +308,7 @@ func throttleCommandHelper(cmd *cobra.Command, throttleType bool) error {
rule.ExpiresAt = protoutil.TimeToProto(time.Now().Add(throttle.DefaultAppThrottleDuration))
} else {
rule.Ratio = 0
rule.ExpiresAt = protoutil.TimeToProto(time.Now())
rule.ExpiresAt = &vttime.Time{} // zero
}

if strings.ToLower(uuid) == AllMigrationsIndicator {
Expand Down
113 changes: 108 additions & 5 deletions go/cmd/vtctldclient/command/throttler.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ import (

topodatapb "vitess.io/vitess/go/vt/proto/topodata"
vtctldatapb "vitess.io/vitess/go/vt/proto/vtctldata"
"vitess.io/vitess/go/vt/proto/vttime"
"vitess.io/vitess/go/vt/topo/topoproto"
"vitess.io/vitess/go/vt/vttablet/tabletserver/throttle"
"vitess.io/vitess/go/vt/vttablet/tabletserver/throttle/base"
"vitess.io/vitess/go/vt/vttablet/tabletserver/throttle/throttlerapp"
)

var (
Expand All @@ -37,33 +41,61 @@ var (
Short: "Update the tablet throttler configuration for all tablets in the given keyspace (across all cells)",
DisableFlagsInUseLine: true,
Args: cobra.ExactArgs(1),
PreRunE: validateUpdateThrottlerConfig,
RunE: commandUpdateThrottlerConfig,
}
CheckThrottler = &cobra.Command{
Use: "CheckThrottler [--app-name <name>] <tablet alias>",
Short: "Issue a throttler check on the given tablet.",
Example: "CheckThrottler --app-name online-ddl zone1-0000000101",
DisableFlagsInUseLine: true,
Args: cobra.ExactArgs(1),
RunE: commandCheckThrottler,
}

GetThrottlerStatus = &cobra.Command{
Use: "GetThrottlerStatus <tablet alias>",
Short: "Get the throttler status for the given tablet.",
Example: "GetThrottlerStatus zone1-0000000101",
DisableFlagsInUseLine: true,
Args: cobra.ExactArgs(1),
RunE: commandGetThrottlerStatus,
}
)

var (
updateThrottlerConfigOptions vtctldatapb.UpdateThrottlerConfigRequest
throttledAppRule topodatapb.ThrottledAppRule
unthrottledAppRule topodatapb.ThrottledAppRule
throttledAppDuration time.Duration

checkThrottlerOptions vtctldatapb.CheckThrottlerRequest
requestHeartbeats bool
)

func validateUpdateThrottlerConfig(cmd *cobra.Command, args []string) error {
if updateThrottlerConfigOptions.MetricName != "" && !cmd.Flags().Changed("threshold") {
return fmt.Errorf("--metric-name flag requires --threshold flag. Set threshold to 0 to disable the metric threshold configuration")
}
if cmd.Flags().Changed("app-name") && updateThrottlerConfigOptions.AppName == "" {
return fmt.Errorf("--app-name must not be empty")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and somewhat a personal preference, I think, but IMO it's worth moving all of these checks to the PreRunE hook for the command. I think that makes it clearer that this is part of the pre-check validations done for the command before we actually execute it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't aware of PreRunE. Looking into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

return nil
}

func commandUpdateThrottlerConfig(cmd *cobra.Command, args []string) error {
keyspace := cmd.Flags().Arg(0)
cli.FinishedParsing(cmd)

if throttledAppRule.Name != "" && unthrottledAppRule.Name != "" {
return fmt.Errorf("throttle-app and unthrottle-app are mutually exclusive")
}

updateThrottlerConfigOptions.CustomQuerySet = cmd.Flags().Changed("custom-query")
updateThrottlerConfigOptions.Keyspace = keyspace

if throttledAppRule.Name != "" {
throttledAppRule.ExpiresAt = protoutil.TimeToProto(time.Now().Add(throttledAppDuration))
updateThrottlerConfigOptions.ThrottledApp = &throttledAppRule
} else if unthrottledAppRule.Name != "" {
unthrottledAppRule.ExpiresAt = protoutil.TimeToProto(time.Now())
unthrottledAppRule.ExpiresAt = &vttime.Time{} // zero
updateThrottlerConfigOptions.ThrottledApp = &unthrottledAppRule
}

Expand All @@ -74,9 +106,67 @@ func commandUpdateThrottlerConfig(cmd *cobra.Command, args []string) error {
return nil
}

func commandCheckThrottler(cmd *cobra.Command, args []string) error {
alias, err := topoproto.ParseTabletAlias(cmd.Flags().Arg(0))
if err != nil {
return err
}

cli.FinishedParsing(cmd)
if _, err := base.ScopeFromString(checkThrottlerOptions.Scope); err != nil {
return err
}
Comment on lines +116 to +118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but this feels like something we would do before we mark parsing as finished as we're ensuring the passed value is valid right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flags won't be assigned before FinishedParsing(). I think this validation is in line with the general practice in vtctldclient.

resp, err := client.CheckThrottler(commandCtx, &vtctldatapb.CheckThrottlerRequest{
TabletAlias: alias,
AppName: checkThrottlerOptions.AppName,
Scope: checkThrottlerOptions.Scope,
SkipRequestHeartbeats: !requestHeartbeats,
OkIfNotExists: checkThrottlerOptions.OkIfNotExists,
})
if err != nil {
return err
}

data, err := cli.MarshalJSON(resp)
if err != nil {
return err
}

fmt.Printf("%s\n", data)

return nil
}

func commandGetThrottlerStatus(cmd *cobra.Command, args []string) error {
alias, err := topoproto.ParseTabletAlias(cmd.Flags().Arg(0))
if err != nil {
return err
}

cli.FinishedParsing(cmd)

resp, err := client.GetThrottlerStatus(commandCtx, &vtctldatapb.GetThrottlerStatusRequest{
TabletAlias: alias,
})
if err != nil {
return err
}

data, err := cli.MarshalJSON(resp)
if err != nil {
return err
}

fmt.Printf("%s\n", data)

return nil
}

func init() {
// UpdateThrottlerConfig
UpdateThrottlerConfig.Flags().BoolVar(&updateThrottlerConfigOptions.Enable, "enable", false, "Enable the throttler")
UpdateThrottlerConfig.Flags().BoolVar(&updateThrottlerConfigOptions.Disable, "disable", false, "Disable the throttler")
UpdateThrottlerConfig.Flags().StringVar(&updateThrottlerConfigOptions.MetricName, "metric-name", "", "name of the metric for which we apply a new threshold (requires --threshold). If empty, the default (either 'lag' or 'custom') metric is used.")
UpdateThrottlerConfig.Flags().Float64Var(&updateThrottlerConfigOptions.Threshold, "threshold", 0, "threshold for the either default check (replication lag seconds) or custom check")
UpdateThrottlerConfig.Flags().StringVar(&updateThrottlerConfigOptions.CustomQuery, "custom-query", "", "custom throttler check query")
UpdateThrottlerConfig.Flags().BoolVar(&updateThrottlerConfigOptions.CheckAsCheckSelf, "check-as-check-self", false, "/throttler/check requests behave as is /throttler/check-self was called")
Expand All @@ -87,6 +177,19 @@ func init() {
UpdateThrottlerConfig.Flags().Float64Var(&throttledAppRule.Ratio, "throttle-app-ratio", throttle.DefaultThrottleRatio, "ratio to throttle app (app specififed in --throttled-app)")
UpdateThrottlerConfig.Flags().DurationVar(&throttledAppDuration, "throttle-app-duration", throttle.DefaultAppThrottleDuration, "duration after which throttled app rule expires (app specififed in --throttled-app)")
UpdateThrottlerConfig.Flags().BoolVar(&throttledAppRule.Exempt, "throttle-app-exempt", throttledAppRule.Exempt, "exempt this app from being at all throttled. WARNING: use with extreme care, as this is likely to push metrics beyond the throttler's threshold, and starve other apps")
UpdateThrottlerConfig.Flags().StringVar(&updateThrottlerConfigOptions.AppName, "app-name", "", "app name for which to assign metrics (requires --app-metrics)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, you can create flag groups so that cobra enforces mutual inclusion/exclusion etc. https://github.com/spf13/cobra/blob/main/site/content/user_guide.md#flag-groups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

UpdateThrottlerConfig.Flags().StringSliceVar(&updateThrottlerConfigOptions.AppCheckedMetrics, "app-metrics", nil, "metrics to be used when checking the throttler for the app (requires --app-name). Empty to restore to default metrics")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but noting the inconsistent use of sentence capitalization here and for metric-name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the inconsistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpdateThrottlerConfig.Flags().StringSliceVar(&updateThrottlerConfigOptions.AppCheckedMetrics, "app-metrics", nil, "metrics to be used when checking the throttler for the app (requires --app-name). Empty to restore to default metrics")
UpdateThrottlerConfig.Flags().StringSliceVar(&updateThrottlerConfigOptions.AppCheckedMetrics, "app-metrics", nil, "metrics to be used when checking the throttler for the app (requires --app-name). empty to restore to default metrics")

I think it's the Empty

UpdateThrottlerConfig.MarkFlagsMutuallyExclusive("unthrottle-app", "throttle-app")
UpdateThrottlerConfig.MarkFlagsRequiredTogether("app-name", "app-metrics")

Root.AddCommand(UpdateThrottlerConfig)
// Check Throttler
CheckThrottler.Flags().StringVar(&checkThrottlerOptions.AppName, "app-name", throttlerapp.VitessName.String(), "app name to check")
CheckThrottler.Flags().StringVar(&checkThrottlerOptions.Scope, "scope", base.UndefinedScope.String(), "check scope ('shard', 'self' or leave empty for per-metric defaults)")
CheckThrottler.Flags().BoolVar(&requestHeartbeats, "request-heartbeats", false, "request heartbeat lease")
CheckThrottler.Flags().BoolVar(&checkThrottlerOptions.OkIfNotExists, "ok-if-not-exists", false, "return OK even if metric does not exist")
Root.AddCommand(CheckThrottler)

// GetThrottlerStatus
Root.AddCommand(GetThrottlerStatus)
}
2 changes: 2 additions & 0 deletions go/flags/endtoend/vtctldclient.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Available Commands:
Backup Uses the BackupStorage service on the given tablet to create and store a new backup.
BackupShard Finds the most up-to-date REPLICA, RDONLY, or SPARE tablet in the given shard and uses the BackupStorage service on that tablet to create and store a new backup.
ChangeTabletType Changes the db type for the specified tablet, if possible.
CheckThrottler Issue a throttler check on the given tablet.
CreateKeyspace Creates the specified keyspace in the topology.
CreateShard Creates the specified shard in the topology.
DeleteCellInfo Deletes the CellInfo for the provided cell.
Expand Down Expand Up @@ -56,6 +57,7 @@ Available Commands:
GetTablet Outputs a JSON structure that contains information about the tablet.
GetTabletVersion Print the version of a tablet from its debug vars.
GetTablets Looks up tablets according to filter criteria.
GetThrottlerStatus Get the throttler status for the given tablet.
GetTopologyPath Gets the value associated with the particular path (key) in the topology server.
GetVSchema Prints a JSON representation of a keyspace's topo record.
GetWorkflows Gets all vreplication workflows (Reshard, MoveTables, etc) in the given keyspace.
Expand Down
4 changes: 2 additions & 2 deletions go/test/endtoend/cluster/vtctldclient_process.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,13 @@ func (vtctldclient *VtctldClientProcess) ExecuteCommandWithOutput(args ...string
pArgs = append(pArgs, "--test.coverprofile="+getCoveragePath("vtctldclient-"+args[0]+".out"), "--test.v")
}
pArgs = append(pArgs, args...)
for i := 1; i <= retries; i++ {
for i := range retries {
tmpProcess := exec.Command(
vtctldclient.Binary,
filterDoubleDashArgs(pArgs, vtctldclient.VtctldClientMajorVersion)...,
)
msg := binlogplayer.LimitString(strings.Join(tmpProcess.Args, " "), 256) // limit log line length
log.Infof("Executing vtctldclient with command: %v (attempt %d of %d)", msg, i, retries)
log.Infof("Executing vtctldclient with command: %v (attempt %d of %d)", msg, i+1, retries)
resultByte, err = tmpProcess.CombinedOutput()
resultStr = string(resultByte)
if err == nil || !shouldRetry(resultStr) {
Expand Down
30 changes: 12 additions & 18 deletions go/test/endtoend/onlineddl/flow/onlineddl_flow_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ func TestSchemaChange(t *testing.T) {
shards = clusterInstance.Keyspaces[0].Shards
require.Equal(t, 1, len(shards))

throttler.EnableLagThrottlerAndWaitForStatus(t, clusterInstance, time.Second)
throttler.EnableLagThrottlerAndWaitForStatus(t, clusterInstance)

t.Run("flow", func(t *testing.T) {
t.Run("create schema", func(t *testing.T) {
Expand Down Expand Up @@ -283,24 +283,18 @@ func TestSchemaChange(t *testing.T) {
onlineddl.CheckMigrationStatus(t, &vtParams, shards, uuid, schema.OnlineDDLStatusRunning)
})
t.Run("throttle online-ddl", func(t *testing.T) {
onlineddl.CheckThrottledApps(t, &vtParams, throttlerapp.OnlineDDLName, false)
onlineddl.ThrottleAllMigrations(t, &vtParams)
onlineddl.CheckThrottledApps(t, &vtParams, throttlerapp.OnlineDDLName, true)

for _, tab := range tablets {
body, err := throttleApp(tab.VttabletProcess, throttlerapp.OnlineDDLName)
assert.NoError(t, err)
assert.Contains(t, body, throttlerapp.OnlineDDLName)
}
waitForThrottleCheckStatus(t, throttlerapp.OnlineDDLName, primaryTablet, http.StatusExpectationFailed)
})
t.Run("unthrottle online-ddl", func(t *testing.T) {
onlineddl.UnthrottleAllMigrations(t, &vtParams)
onlineddl.CheckThrottledApps(t, &vtParams, throttlerapp.OnlineDDLName, false)

for _, tab := range tablets {
body, err := unthrottleApp(tab.VttabletProcess, throttlerapp.OnlineDDLName)
if !onlineddl.CheckThrottledApps(t, &vtParams, throttlerapp.OnlineDDLName, false) {
status, err := throttler.GetThrottlerStatus(&clusterInstance.VtctldClientProcess, primaryTablet)
assert.NoError(t, err)
assert.Contains(t, body, throttlerapp.OnlineDDLName)

t.Logf("Throttler status: %+v", status)
shlomi-noach marked this conversation as resolved.
Show resolved Hide resolved
}
waitForThrottleCheckStatus(t, throttlerapp.OnlineDDLName, primaryTablet, http.StatusOK)
})
Expand Down Expand Up @@ -341,7 +335,7 @@ func TestSchemaChange(t *testing.T) {
t.Run("optimistic wait for migration completion", func(t *testing.T) {
status := onlineddl.WaitForMigrationStatus(t, &vtParams, shards, uuid, migrationWaitTimeout, schema.OnlineDDLStatusComplete)
isComplete = (status == schema.OnlineDDLStatusComplete)
fmt.Printf("# Migration status (for debug purposes): <%s>\n", status)
t.Logf("# Migration status (for debug purposes): <%s>", status)
})
if !isComplete {
t.Run("force complete cut-over", func(t *testing.T) {
Expand All @@ -350,7 +344,7 @@ func TestSchemaChange(t *testing.T) {
t.Run("another optimistic wait for migration completion", func(t *testing.T) {
status := onlineddl.WaitForMigrationStatus(t, &vtParams, shards, uuid, migrationWaitTimeout, schema.OnlineDDLStatusComplete)
isComplete = (status == schema.OnlineDDLStatusComplete)
fmt.Printf("# Migration status (for debug purposes): <%s>\n", status)
t.Logf("# Migration status (for debug purposes): <%s>", status)
})
}
if !isComplete {
Expand All @@ -364,7 +358,7 @@ func TestSchemaChange(t *testing.T) {
}
t.Run("wait for migration completion", func(t *testing.T) {
status := onlineddl.WaitForMigrationStatus(t, &vtParams, shards, uuid, migrationWaitTimeout, schema.OnlineDDLStatusComplete)
fmt.Printf("# Migration status (for debug purposes): <%s>\n", status)
t.Logf("# Migration status (for debug purposes): <%s>", status)
onlineddl.CheckMigrationStatus(t, &vtParams, shards, uuid, schema.OnlineDDLStatusComplete)
})
t.Run("validate table schema", func(t *testing.T) {
Expand Down Expand Up @@ -394,15 +388,15 @@ func testOnlineDDLStatement(t *testing.T, alterStatement string, ddlStrategy str
uuid = row.AsString("uuid", "")
uuid = strings.TrimSpace(uuid)
require.NotEmpty(t, uuid)
fmt.Println("# Generated UUID (for debug purposes):")
fmt.Printf("<%s>\n", uuid)
t.Logf("# Generated UUID (for debug purposes):")
t.Logf("<%s>", uuid)

strategySetting, err := schema.ParseDDLStrategy(ddlStrategy)
assert.NoError(t, err)

if !strategySetting.Strategy.IsDirect() && !skipWait && uuid != "" {
status := onlineddl.WaitForMigrationStatus(t, &vtParams, shards, uuid, migrationWaitTimeout, schema.OnlineDDLStatusComplete, schema.OnlineDDLStatusFailed)
fmt.Printf("# Migration status (for debug purposes): <%s>\n", status)
t.Logf("# Migration status (for debug purposes): <%s>", status)
}

if expectHint != "" {
Expand Down
Loading
Loading