resource_control: support dynamic calibrate resource #43098

CabinfeverB · 2023-04-17T09:53:17Z

What problem does this PR solve?

Issue Number: ref #38825

Problem Summary:
The maximum RU estimated by this PR is based on an actual and running workload by user. And user can set the time point.

Similar to #42165, we only consider TiDB CPU or TiKV CPU as bottleneck. Also, the resource consuming is linear co-related with each other.
For each metrics sampling point, the PR calculates the RU quota at each point in time using RU statistics, tidb CPU statistics, and tikv statistics. Then removes the 10% maximum and 10% minimum, then calculates the average. In addition, if CPU resource utilization is low at some point in time, it will not be included in the calculation

And ref tikv/pd#6298, update pd client.

What is changed and how it works?

Check List

Tests

Unit test
Manual test (add detailed scripts or steps below)

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

calibrate resource support dynamic calibrate for user-actual workload with specific time point.

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot · 2023-04-17T09:53:19Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

glorv
nolouch

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: Cabinfever_B <[email protected]>

glorv · 2023-04-17T09:58:45Z

parser/parser.y

@@ -14827,13 +14836,54 @@ PlanReplayerStmt:
 * CALIBRATE RESOURCE
 *******************************************************************/
 CalibrateResourceStmt:
-	"CALIBRATE" "RESOURCE" CalibrateResourceWorkloadOption
+	"CALIBRATE" "RESOURCE" CalibrateResourceWorkloadOption DynamicCalibrateOptionListOpt


Should only provide CalibrateResourceWorkloadOption or DynamicCalibrateOptionListOpt but not both

They are optional, but I don't kown how to check it in yacc. Maybe I can check it in calibrateResourceExec.

parser/ast/misc.go

executor/calibrate_resource.go

glorv · 2023-04-17T10:11:38Z

executor/calibrate_resource.go

+		}
+		return nil
+	}
+	if len(e.optionList) == 2 {


I'd like to replace this if-else with something like following:

var start, end, dur *ptr for _, op := range e.optionList[0] { ... } if duration == nil { ...default for duration} if start == nil { ...default-for-start } if end == nil { ...default-for-end } validate_start_end_duration() ...rest logics

And Since The static and dynamic branch have little common logic with each other, please wrap both of them with a separate function to avoid too long if..else.. block

executor/calibrate_resource.go

glorv · 2023-04-17T10:19:12Z

executor/calibrate_resource.go

+			if idx >= len(tikvCPUs) || idx >= len(tidbCPUs) {
+				break
+			}
+			tikvQuota := totalKVCPUQuota / tikvCPUs[idx]


What if tikvCPUs[idx] == 0 here, I think check tikvCPUs[idx]/totalKVCPUQuota as the cpu usage percentage is a more ergonomic way

glorv · 2023-04-17T10:23:37Z

executor/calibrate_resource.go

+			if tikvQuota > lowUsageThreshold {
+				lowCount++
+				tikvCPULowCOunt++
+				if tidbQuota > lowUsageThreshold {


I think if one of the two cpu usage is greater than the lowUsageThreshold, we should keep it. Maybe there are cluster topologies that tidb cpu quota >> tikv cpu
quota or vice verse, then no samples can be valid here.

How about add valuableUsageThreshold? If one of the two cpu usage is greater than the valuableUsageThreshold, we can accept it

executor/calibrate_resource.go

HuSharp · 2023-04-17T10:01:33Z

executor/calibrate_resource.go

+func getRUPerSec(ctx context.Context, exec sqlexec.RestrictedSQLExecutor, startTime, endTime string) ([]float64, error) {
+	query := fmt.Sprintf("SELECT value FROM METRICS_SCHEMA.resource_manager_resource_unit where time >= '%s' and time <= '%s' ORDER BY time desc", startTime, endTime)
+	logutil.BgLogger().Info("getRUPerSec", zap.String("query", query))
+	return getValuesFromMetrics(ctx, exec, query, "resource_manager_resource_unit")
+}
+
+func getTiDBCPUUsagePerSec(ctx context.Context, exec sqlexec.RestrictedSQLExecutor, startTime, endTime string) ([]float64, error) {
+	query := fmt.Sprintf("SELECT sum(value) FROM METRICS_SCHEMA.process_cpu_usage where time >= '%s' and time <= '%s' and job like '%%tidb' GROUP BY time ORDER BY time desc", startTime, endTime)
+	logutil.BgLogger().Info("getTiDBCPUUsagePerSec", zap.String("getTiDBCPUUsagePerSec", query))
+	return getValuesFromMetrics(ctx, exec, query, "process_cpu_usage")
+}
+
+func getTiKVCPUUsagePerSec(ctx context.Context, exec sqlexec.RestrictedSQLExecutor, startTime, endTime string) ([]float64, error) {
+	query := fmt.Sprintf("SELECT sum(value) FROM METRICS_SCHEMA.process_cpu_usage where time >= '%s' and time <= '%s' and job like '%%tikv' GROUP BY time ORDER BY time desc", startTime, endTime)
+	logutil.BgLogger().Info("getTiKVCPUUsagePerSec", zap.String("getTiKVCPUUsagePerSec", query))
+	return getValuesFromMetrics(ctx, exec, query, "process_cpu_usage")
+}
+
 func getNumberFromMetrics(ctx context.Context, exec sqlexec.RestrictedSQLExecutor, query, metrics string) (float64, error) {


Maybe can uniform case of words in each statement.

executor/calibrate_resource.go

HuSharp · 2023-04-17T10:19:29Z

executor/calibrate_resource.go

+		return nil
+	}
+	if len(e.optionList) == 2 {
+		if e.optionList[0].Tp != ast.CalibrateStartTime || (e.optionList[1].Tp != ast.CalibrateEndTime && e.optionList[1].Tp != ast.CalibrateDuration) {


e.optionList[1].Tp != ast.CalibrateEndTime && e.optionList[1].Tp != ast.CalibrateDuration
I'm not sure why the same parameter would have && judgment twice

Because we're trying to determine if it's wrong. The expression to assert truth is e.optionList[1].Tp == ast.CalibrateEndTime || e.optionList[1].Tp == ast.CalibrateDuration

HuSharp · 2023-04-17T10:26:37Z

parser/ast/misc.go

+	case CalibrateEndTime:
+		ctx.WriteKeyWord("END_TIME ")
+		if err := n.Ts.Restore(ctx); err != nil {
+			return errors.Annotate(err, "An error occurred while splicing DynamicCalibrateResourceOption EndTime")


Maybe can return error directly, because there is same check at if err := option.Restore(ctx); err != nil {

parser/misc.go

parser/parser.y

Signed-off-by: Cabinfever_B <[email protected]>

CabinfeverB · 2023-04-17T14:00:05Z

/test unit-test

CabinfeverB · 2023-04-17T15:16:31Z

/test unit-test

Signed-off-by: Cabinfever_B <[email protected]>

HuSharp · 2023-04-18T03:23:05Z

executor/calibrate_resource.go

@@ -99,7 +177,95 @@ func (e *calibrateResourceExec) Next(ctx context.Context, req *chunk.Chunk) erro

 	exec := e.ctx.(sqlexec.RestrictedSQLExecutor)
 	ctx = kv.WithInternalSourceType(ctx, kv.InternalTxnOthers)
+	if len(e.optionList) > 0 && e.workloadType != ast.WorkloadNone {


Maybe can put this check in dynamicCalibrate?

HuSharp · 2023-04-18T03:29:22Z

parser/parser.y

@@ -715,6 +717,7 @@ import (
 	s3                    "S3"
 	schedule              "SCHEDULE"
 	staleness             "STALENESS"
+	startTime             "START_TIME"


I noticed there is a startTS located below, do we need to use startTS?

IMO, START_TIME is better

I prefer start_time too. StartTs represent the pd txn timestamp in this context, but the start_time is a real datetime, better not to mix them.

Makes sense

Signed-off-by: Cabinfever_B <[email protected]>

CabinfeverB · 2023-04-19T05:37:00Z

/test unit-test

HuSharp · 2023-04-19T06:05:21Z

/retest

glorv · 2023-04-19T06:23:05Z

executor/calibrate_resource.go

@@ -83,13 +91,72 @@ type baseResourceCost struct {
 	writeReqCount uint64
 }

+const (
+	valuableUsageThreshold = 0.2


Please add comments for these constants

Signed-off-by: Cabinfever_B <[email protected]>

glorv

LGTM

HuSharp

LGTM

ti-chi-bot · 2023-04-19T07:52:12Z

@HuSharp: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

HuSharp · 2023-04-19T07:52:41Z

/retest

nolouch · 2023-04-19T08:12:25Z

go.mod

@@ -281,5 +281,6 @@ replace (
 	// fix potential security issue(CVE-2020-26160) introduced by indirect dependency.
 	github.com/dgrijalva/jwt-go => github.com/form3tech-oss/jwt-go v3.2.6-0.20210809144907-32ab6a8243d7+incompatible
 	github.com/pingcap/tidb/parser => ./parser
+	github.com/tikv/pd/client => github.com/CabinfeverB/pd/client v0.0.0-20230418121422-fb8aaee248a8


why replace it?

there are two PRs will be merged in /pd/client, I want to wait until they are merged

better use a separate pr to update it.

Signed-off-by: Cabinfever_B <[email protected]>

nolouch · 2023-04-19T11:13:24Z

/test unit-test

Signed-off-by: Cabinfever_B <[email protected]>

glorv · 2023-04-19T13:57:28Z

/merge

ti-chi-bot · 2023-04-19T13:57:32Z

This pull request has been accepted and is ready to merge.

Commit hash: 8172174

CabinfeverB added 5 commits April 17, 2023 14:10

dynamic calibrate

a143c3b

Signed-off-by: Cabinfever_B <[email protected]>

merge master

31a0b8b

Signed-off-by: Cabinfever_B <[email protected]>

merge master

8f37d16

Signed-off-by: Cabinfever_B <[email protected]>

merge master

11969fb

Signed-off-by: Cabinfever_B <[email protected]>

add test case

ffd3052

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 17, 2023

remove unnecessary log

1f4579b

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot removed the do-not-merge/needs-linked-issue label Apr 17, 2023

fix lint

d0f1555

Signed-off-by: Cabinfever_B <[email protected]>

glorv reviewed Apr 17, 2023

View reviewed changes

HuSharp reviewed Apr 17, 2023

View reviewed changes

HuSharp mentioned this pull request Apr 17, 2023

sql-statement: calibrate resource support workload pingcap/docs-cn#13709

Merged

2 tasks

CabinfeverB added 3 commits April 17, 2023 20:15

address comment

910bde5

Signed-off-by: Cabinfever_B <[email protected]>

address comment

1c15caa

Signed-off-by: Cabinfever_B <[email protected]>

address comment

0ffaab1

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2023

hfxsd assigned hfxsd and unassigned hfxsd Apr 18, 2023

merge master

b33ec77

Signed-off-by: Cabinfever_B <[email protected]>

HuSharp reviewed Apr 18, 2023

View reviewed changes

CabinfeverB added 3 commits April 18, 2023 12:04

address comment

3fd699a

Signed-off-by: Cabinfever_B <[email protected]>

address comment

4680799

Signed-off-by: Cabinfever_B <[email protected]>

address comment

16d9330

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2023

address comment

a8b6b8f

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2023

merge master

262d4ae

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 19, 2023

glorv reviewed Apr 19, 2023

View reviewed changes

HuSharp mentioned this pull request Apr 19, 2023

pkg: add resource manager api pingcap/tidb-dashboard#1511

Merged

CabinfeverB added 2 commits April 19, 2023 15:31

address comment

5acc62c

Signed-off-by: Cabinfever_B <[email protected]>

address comment

9a661fd

Signed-off-by: Cabinfever_B <[email protected]>

glorv approved these changes Apr 19, 2023

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 19, 2023

HuSharp approved these changes Apr 19, 2023

View reviewed changes

nolouch reviewed Apr 19, 2023

View reviewed changes

address comment

95cbac8

Signed-off-by: Cabinfever_B <[email protected]>

nolouch approved these changes Apr 19, 2023

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 19, 2023

CabinfeverB added 2 commits April 19, 2023 20:55

fix typo

ce45939

Signed-off-by: Cabinfever_B <[email protected]>

merge master

8172174

Signed-off-by: Cabinfever_B <[email protected]>

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 19, 2023

ti-chi-bot merged commit 268901f into pingcap:master Apr 19, 2023

nolouch mentioned this pull request Apr 19, 2023

resource_control supports calibrate resource #43212

Closed

3 tasks

CabinfeverB deleted the resource_manager/dynamic_calibrate branch April 20, 2023 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource_control: support dynamic calibrate resource #43098

resource_control: support dynamic calibrate resource #43098

CabinfeverB commented Apr 17, 2023 •

edited

Loading

ti-chi-bot commented Apr 17, 2023 •

edited

Loading

glorv Apr 17, 2023

CabinfeverB Apr 17, 2023 •

edited

Loading

CabinfeverB Apr 18, 2023

glorv Apr 17, 2023

glorv Apr 17, 2023

glorv Apr 17, 2023

CabinfeverB Apr 17, 2023

HuSharp Apr 17, 2023

HuSharp Apr 17, 2023

CabinfeverB Apr 17, 2023

HuSharp Apr 17, 2023

CabinfeverB commented Apr 17, 2023

CabinfeverB commented Apr 17, 2023

HuSharp Apr 18, 2023

CabinfeverB Apr 18, 2023

HuSharp Apr 18, 2023

CabinfeverB Apr 18, 2023

glorv Apr 19, 2023

HuSharp Apr 19, 2023

CabinfeverB commented Apr 19, 2023

HuSharp commented Apr 19, 2023

glorv Apr 19, 2023

glorv left a comment

HuSharp left a comment

ti-chi-bot commented Apr 19, 2023

HuSharp commented Apr 19, 2023

nolouch Apr 19, 2023

CabinfeverB Apr 19, 2023

nolouch Apr 19, 2023

CabinfeverB Apr 19, 2023

nolouch commented Apr 19, 2023

glorv commented Apr 19, 2023

ti-chi-bot commented Apr 19, 2023

resource_control: support dynamic calibrate resource #43098

resource_control: support dynamic calibrate resource #43098

Conversation

CabinfeverB commented Apr 17, 2023 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

CabinfeverB Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CabinfeverB commented Apr 17, 2023

CabinfeverB commented Apr 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CabinfeverB commented Apr 19, 2023

HuSharp commented Apr 19, 2023

Choose a reason for hiding this comment

glorv left a comment

Choose a reason for hiding this comment

HuSharp left a comment

Choose a reason for hiding this comment

ti-chi-bot commented Apr 19, 2023

HuSharp commented Apr 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolouch commented Apr 19, 2023

glorv commented Apr 19, 2023

ti-chi-bot commented Apr 19, 2023

CabinfeverB commented Apr 17, 2023 •

edited

Loading

ti-chi-bot commented Apr 17, 2023 •

edited

Loading

CabinfeverB Apr 17, 2023 •

edited

Loading