Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support global memory control for tidb #37794

Merged
merged 33 commits into from
Sep 26, 2022

Conversation

wshwsh12
Copy link
Contributor

@wshwsh12 wshwsh12 commented Sep 14, 2022

What problem does this PR solve?

Issue Number: ref #37816

Problem Summary:

What is changed and how it works?

  1. Add system variable tidb_server_memory_limit and tidb_server_memory_limit_sess_min_size
  2. When tidb memory usage is larger than limit, kill the memory usage top1 sql.
  3. After the top1 sql is killed, try to GC immediately.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Support global memory control

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Sep 14, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • XuHuaiyu
  • tiancaiamao

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 14, 2022
@wshwsh12 wshwsh12 marked this pull request as ready for review September 14, 2022 08:54
@wshwsh12 wshwsh12 requested a review from a team as a code owner September 14, 2022 08:54
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 14, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 15, 2022
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2022
@@ -57,6 +58,7 @@ func (eqh *Handle) Run() {
defer ticker.Stop()
sm := eqh.sm.Load().(util.SessionManager)
record := &memoryUsageAlarm{}
serverMemoryQuota := &serverMemoryQuota{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use an individual goroutine to monitor the global memory status? The expensive_query worker handles too many things

@wshwsh12 wshwsh12 requested a review from XuHuaiyu September 21, 2022 08:22
util/servermemorylimit/servermemoryquota.go Outdated Show resolved Hide resolved
util/memory/tracker.go Outdated Show resolved Hide resolved
util/servermemorylimit/servermemoryquota.go Outdated Show resolved Hide resolved
util/servermemorylimit/servermemoryquota.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 22, 2022
tracker3.Consume(300 << 20) // 300 MB

test := make([]int, 128<<20) // Keep 1GB HeapInUse
time.Sleep(500 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this Sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check goroutine checks the memory usage every 100ms. The Sleep() make sure that Top1Tracker can be Canceled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add some comment for it.
Sleep slows down the test, and 0.5s is long enough for UT

tracker2.Consume(200 << 20) // 200 MB
tracker3.Consume(300 << 20) // 300 MB

test := make([]int, 128<<20) // Keep 1GB HeapInUse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is virtual memory instead of real allocation before you visit the data.
OS will handle the logical -> physical memory address mapping and allocate the physical pages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only simulates the situation of heapinuse growth.
In normally, TiDB will use the memory immediately after allocation.

require.True(t, strings.Contains(r.(string), "Out Of Memory Quota!"))
})
tracker2.Consume(300 << 20) // Sum 500MB, Not Panic, Waiting t3 cancel finish.
time.Sleep(500 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes time to cancel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for cancel

Comment on lines +723 to +732
Validation: func(s *SessionVars, normalizedValue string, originalValue string, scope ScopeFlag) (string, error) {
intVal, err := strconv.ParseUint(normalizedValue, 10, 64)
if err != nil {
return "", err
}
if intVal > 0 && intVal < (512<<20) { // 512 MB
s.StmtCtx.AppendWarning(ErrTruncatedWrongValue.GenWithStackByArgs(TiDBServerMemoryLimit, originalValue))
intVal = 512 << 20
}
return strconv.FormatUint(intVal, 10), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set MinValue at line 719?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means disable this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... I see. This is kind of confusing
0 means disable, while when the value is smaller than 512MB, it's adjusted to 512MB (actual min value)

@wshwsh12 wshwsh12 mentioned this pull request Sep 23, 2022
9 tasks
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2022
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 26, 2022
@tiancaiamao
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 8fea589

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 26, 2022
@ti-chi-bot ti-chi-bot merged commit 4e4169b into pingcap:master Sep 26, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Sep 26, 2022

TiDB MergeCI notify

🔴 Bad News! [1] CI still failing after this pr merged.
These failed integration tests don't seem to be introduced by the current PR.

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci-tidb/integration-ddl-test 🔴 failed 2, success 4, total 6 29 min Existing failure
idc-jenkins-ci/integration-cdc-test 🟢 all 37 tests passed 27 min Existing passed
idc-jenkins-ci-tidb/integration-common-test 🟢 all 17 tests passed 10 min Existing passed
idc-jenkins-ci-tidb/common-test 🟢 all 11 tests passed 8 min 41 sec Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 7 min 22 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 42 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 4 min 43 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 5 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 2 min 41 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants