util: optimize the memory usage of the read path for listInDisk #34778

wshwsh12 · 2022-05-18T07:40:49Z

What problem does this PR solve?

Issue Number: ref #33877
close #35631
Problem Summary:

Optimize the memory usage of the read path for list in disk.

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
Workload: A big table with 5kw lines 1, a small table with 1 lines 1.
SQL: desc analyze select * from small_table where exists (select a from v5kw big_table where small_table.b=big_table.b);
Plan: With memory quota 1GB

[email protected]:test_xhy> desc    select * from small_table where exists (select a from v5kw big_table where small_table.b=big_table.b);
+----------------------------------+-------------+-----------+-------------------+----------------------------------------------------------------------+
| id                               | estRows     | task      | access object     | operator info                                                        |
+----------------------------------+-------------+-----------+-------------------+----------------------------------------------------------------------+
| HashJoin_15                      | 0.80        | root      |                   | semi join, equal:[eq(test_xhy.small_table.b, test_xhy.base_table.b)] |
| ├─Selection_19(Build)            | 40000000.00 | root      |                   | not(isnull(test_xhy.base_table.b))                                   |
| │ └─Limit_20                     | 50000000.00 | root      |                   | offset:0, count:50000000                                             |
| │   └─TableReader_24             | 50000000.00 | root      |                   | data:Limit_23                                                        |
| │     └─Limit_23                 | 50000000.00 | cop[tikv] |                   | offset:0, count:50000000                                             |
| │       └─TableFullScan_22       | 50000000.00 | cop[tikv] | table:base_table  | keep order:false                                                     |
| └─TableReader_18(Probe)          | 1.00        | root      |                   | data:Selection_17                                                    |
|   └─Selection_17                 | 1.00        | cop[tikv] |                   | not(isnull(test_xhy.small_table.b))                                  |
|     └─TableFullScan_16           | 1.00        | cop[tikv] | table:small_table | keep order:false, stats:pseudo                                       |
+----------------------------------+-------------+-----------+-------------------+----------------------------------------------------------------------+
9 rows in set (0.00 sec)

Before this pr: TiDB use 60GB+ and OOM.
After this pr: We can see the GC line is 5.6GB, so the memory usage after each GC is 2.8GB

No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

ti-chi-bot · 2022-05-18T07:40:50Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

XuHuaiyu
guo-shaoge

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

sre-bot · 2022-05-18T07:54:55Z

Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/efa178aa521463c29cd6036b5b09e3029aca0076

XuHuaiyu · 2022-05-18T08:25:15Z

util/chunk/disk.go

@@ -39,6 +39,8 @@ type ListInDisk struct {

 	dataFile   diskFileReaderWriter
 	offsetFile diskFileReaderWriter
+
+	chk *Chunk


Add a comment for this attribute.

guo-shaoge · 2022-05-23T06:11:36Z

util/chunk/disk.go

-	chk := &Chunk{columns: make([]*Column, 0, len(format.sizesOfColumns))}
+// toRow deserializes diskFormatRow to Row.
+func (format *diskFormatRow) toRow(fields []*types.FieldType, chk *Chunk) (Row, *Chunk) {
+	if chk == nil || chk.IsFull() {


Just a quesction, chk == nil maybe unnecessary? Because sync.Pool will always new a chunk.

Check chk==nil is more safe.. For example, the code in test cases doesn't have the sync.Pool and doesn't need to reuse.

XuHuaiyu · 2022-06-06T09:54:25Z

/merge

ti-chi-bot · 2022-06-06T09:54:28Z

This pull request has been accepted and is ready to merge.

Commit hash: 821f656

ti-chi-bot · 2022-06-06T09:54:41Z

@wshwsh12: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

toMutRow

64cfab4

ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 18, 2022

wshwsh12 force-pushed the toMutRow branch from f224064 to 0a50654 Compare May 18, 2022 07:41

wshwsh12 mentioned this pull request May 18, 2022

Add more memory tracking places to improve memory management #33877

Closed

10 tasks

fix

aaa3f4e

wshwsh12 force-pushed the toMutRow branch from 0a50654 to aaa3f4e Compare May 18, 2022 08:04

XuHuaiyu reviewed May 18, 2022

View reviewed changes

add comments

b98dcbe

wshwsh12 requested a review from XuHuaiyu May 18, 2022 08:34

wshwsh12 changed the title ~~util: optimize the memory usage of the read path for listInDIsk~~ util: optimize the memory usage of the read path for listInDisk May 18, 2022

fix

1d3fd6b

ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 23, 2022

add ut

42abe62

wshwsh12 force-pushed the toMutRow branch from 733fd9e to 42abe62 Compare May 23, 2022 02:32

fix lint

821f656

guo-shaoge reviewed May 23, 2022

View reviewed changes

guo-shaoge approved these changes May 23, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 23, 2022

XuHuaiyu approved these changes Jun 6, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 6, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 6, 2022

Merge branch 'master' into toMutRow

efa178a

ti-chi-bot merged commit 0e278b9 into pingcap:master Jun 6, 2022

hawkingrei mentioned this pull request Jun 6, 2022

DATA RACE in the chunk.Column #35191

Closed

wshwsh12 mentioned this pull request Jun 22, 2022

Optimize the memory usage of the read path in hashRowContainer #35631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util: optimize the memory usage of the read path for listInDisk #34778

util: optimize the memory usage of the read path for listInDisk #34778

wshwsh12 commented May 18, 2022 •

edited by XuHuaiyu

Loading

ti-chi-bot commented May 18, 2022 •

edited

Loading

sre-bot commented May 18, 2022 •

edited

Loading

XuHuaiyu May 18, 2022

guo-shaoge May 23, 2022

wshwsh12 May 23, 2022

XuHuaiyu commented Jun 6, 2022

ti-chi-bot commented Jun 6, 2022

ti-chi-bot commented Jun 6, 2022

util: optimize the memory usage of the read path for listInDisk #34778

util: optimize the memory usage of the read path for listInDisk #34778

Conversation

wshwsh12 commented May 18, 2022 • edited by XuHuaiyu Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented May 18, 2022 • edited Loading

sre-bot commented May 18, 2022 • edited Loading

XuHuaiyu May 18, 2022

Choose a reason for hiding this comment

guo-shaoge May 23, 2022

Choose a reason for hiding this comment

wshwsh12 May 23, 2022

Choose a reason for hiding this comment

XuHuaiyu commented Jun 6, 2022

ti-chi-bot commented Jun 6, 2022

ti-chi-bot commented Jun 6, 2022

wshwsh12 commented May 18, 2022 •

edited by XuHuaiyu

Loading

ti-chi-bot commented May 18, 2022 •

edited

Loading

sre-bot commented May 18, 2022 •

edited

Loading