Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

dump: always split TiDB v4.* tables through tidb rowid to save TiDB's memory #273

Merged
merged 11 commits into from
Apr 30, 2021

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Apr 23, 2021

What problem does this PR solve?

fix #104 and close #278

What is changed and how it works?

Split chunks by PKIsHandle for TiDB v4.0.*. Make sure TiDB won't OOM through splitting tables by rowids.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Test dump with v4.0.0. The result is looks expected.

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Always split TiDB v4.* tables through tidb rowid to save TiDB's memory.

@ti-chi-bot ti-chi-bot requested review from 3pointer and kennytm April 23, 2021 11:09
@lichunzhu lichunzhu requested review from lance6716 and removed request for 3pointer April 23, 2021 11:11
v4/export/sql_type.go Show resolved Hide resolved
v4/export/sql.go Show resolved Hide resolved

dataTypeNumArr := []string{
"INTEGER", "BIGINT", "TINYINT", "SMALLINT", "MEDIUMINT",
"INT", "INT1", "INT2", "INT3", "INT8",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we forget an int symbol INT4

https://github.com/mysql/mysql-server/blob/3e90d07c3578e4da39dc1bce73559bbdf655c28c/sql/lex.h#L330

So I doubt if we have list them totally 🤔 we may ask parser guys for help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comments in 69c1e7c

colTypeRowReceiverMap[s] = SQLTypeNumberMaker
}
for _, s := range dataTypeBin {
for _, s := range dataTypeBinArr {
dataTypeBin[s] = struct{}{}
colTypeRowReceiverMap[s] = SQLTypeBytesMaker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if WriteToBuffer of SQLTypeBytes need escaping quote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLTypeBytes will transfer binary data to base 16 code data, so I think we won't have this problem.
https://github.com/pingcap/dumpling/blob/bff60f8/v4/export/sql_type.go#L284

v4/export/dump.go Show resolved Hide resolved
}

func extractTiDBRowIDFromDecodedKey(indexField, key string) (string, error) {
if p := strings.Index(key, indexField); p != -1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a system test, to get a report when TiDB changes the behaviour 😵

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested TiDB v4.0.0 and v4.0.12. In TiDB v4.x versions, they work well. I can add a test to make sure TiDB won't change this.

v4/export/dump.go Outdated Show resolved Hide resolved
v4/export/sql.go Show resolved Hide resolved
"TIMESTAMP", "DATETIME", "DATE", "TIME", "YEAR", "SQL_TSI_YEAR",
"TEXT", "TINYTEXT", "MEDIUMTEXT", "LONGTEXT",
"ENUM", "SET", "JSON", "NULL", "VAR_STRING",
"GEOMETRY", // TODO: support GEOMETRY later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to dataTypeBinArr.

v4/export/sql.go Outdated
@@ -458,7 +470,8 @@ func GetSpecifiedColumnValue(rows *sql.Rows, columnName string) ([]string, error
strs = append(strs, oneRow[fieldIndex].String)
}
}
return strs, nil
rows.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that defer you're closing rows twice 🤔

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

if len(partitions) == 0 {
handleColNames, handleVals, err = selectTiDBTableRegion(tctx, conn, db, tbl)
} else {
return d.concurrentDumpTiDBPartitionTables(tctx, conn, meta, taskChan, partitions)
Copy link
Collaborator

@lance6716 lance6716 Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weak review: could we not return concurrentDumpTiDBPartitionTables, instead, return enough data and reuse below sendConcurrentDumpTiDBTasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do that later

@lance6716
Copy link
Collaborator

/lgtm

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added the status/LGT1 One reviewer approved (LGTM1) label Apr 30, 2021
@lichunzhu
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 55f9278

@ti-chi-bot ti-chi-bot merged commit d6e5fb4 into pingcap:master Apr 30, 2021
@lichunzhu lichunzhu deleted the mergeSplitRegion branch April 30, 2021 03:23
@lichunzhu
Copy link
Contributor Author

/cherrypick release-5.0

@ti-chi-bot
Copy link
Member

@lichunzhu: new pull request created: #280.

In response to this:

/cherrypick release-5.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
tisonkun pushed a commit to tisonkun/tidb that referenced this pull request Oct 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/XXL status/can-merge status/LGT1 One reviewer approved (LGTM1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dumpling can't dump geometry type data correctly tidb oom while dumpling massive data
4 participants