Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relay/meta(dm): fix potential data races after saving GTID #4455

Merged
merged 3 commits into from
Jan 26, 2022

Conversation

dsdashun
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #4166

What is changed and how it works?

When a GTID set is saved into LocalMeta, clone this GTID set first and then do the updating.
Otherwise, after a GTID set is saved into LocalMeta, the input GTID set and the one inside LocalMeta are referencing the same object. It'll cause potential data race when modifying the input GTID set object and reading the GTID set inside LocalMeta happen simultaneously.

Check List

Tests

  • Unit test

Code changes

N/A

Side effects

N/A

Related changes

N/A

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 24, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Ehco1996
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note-none Denotes a PR that doesn't merit a release note. labels Jan 24, 2022
@CLAassistant
Copy link

CLAassistant commented Jan 24, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 24, 2022
@dsdashun
Copy link
Contributor Author

/cc @glorv @lance6716

@ti-chi-bot ti-chi-bot requested review from glorv and lance6716 January 24, 2022 07:27
@lance6716 lance6716 added the area/dm Issues or PRs related to DM. label Jan 24, 2022
* clone GTID before saving into meta to make meta's GTID independent
* add a UT to verify

close pingcap#4166
@@ -212,7 +212,7 @@ func (lm *LocalMeta) Save(pos mysql.Position, gset gtid.Set) error {
lm.BinlogGTID = ""
} else {
lm.BinlogGTID = gset.String()
lm.gset = gset
lm.gset = gset.Clone() // need to clone and set, in order to avoid the local meta's gset and the input gset referencing the same object, causing contentions later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just add a lock in func (r *Relay) SaveMeta(pos mysql.Position, gset gtid.Set) is more intuitive 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a lock on (*Relay).SaveMeta() won't work. Because the data race happens when the doing the (gtid.Set).Set() and (gtid.Set).Clone() on the same object. Even if (*Relay).SaveMeta() is executed serially, data race will still happen when the input gtid.Set object is also referenced by the LocalMeta afterwards.

Actually, there are two solutions on the problem. The first one is to add a locking mechanism on corresponding methods in gtid.Set implementation. The second one is to ensure that the gtid.Set object inside LocalMeta is independent.

The first method involves many modifications and may have some performance penalty. What's more, some current mechanism in LocalMeta tends to favor the second method. For example, (*LocalMeta).GTID() clones the gtid.Set before return, which intends to isolate the returning gtid.Set object from the object inside LocalMeta.

So considering only the gtid.Set object inside LocalMeta is accessed frequently, and currently there are no places where the gtid.Set inside LocalMeta is updated directly (which means (*LocalMeta).gset is always updated by assigning a new object, rather than calling (*LocalMeta).gset.Set() ), I used the second method to fix this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks explain this reason i will read more carefully later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commets help, i got the reason of datarace now ~

@dsdashun
Copy link
Contributor Author

/run-verify

@codecov-commenter
Copy link

Codecov Report

Merging #4455 (9607554) into master (e469fb3) will decrease coverage by 0.1675%.
The diff coverage is 54.3026%.

Flag Coverage Δ
cdc 59.9222% <60.5381%> (-0.2652%) ⬇️
dm 52.0288% <42.1052%> (-0.0747%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4455        +/-   ##
================================================
- Coverage   55.8077%   55.6402%   -0.1675%     
================================================
  Files           495        494         -1     
  Lines         61246      61283        +37     
================================================
- Hits          34180      34098        -82     
- Misses        23614      23750       +136     
+ Partials       3452       3435        -17     

Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restLGTM

@@ -212,7 +212,7 @@ func (lm *LocalMeta) Save(pos mysql.Position, gset gtid.Set) error {
lm.BinlogGTID = ""
} else {
lm.BinlogGTID = gset.String()
lm.gset = gset
lm.gset = gset.Clone() // need to clone and set, in order to avoid the local meta's gset and the input gset referencing the same object, causing contentions later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commets help, i got the reason of datarace now ~

@@ -234,3 +235,65 @@ func (r *testMetaSuite) TestLocalMeta(c *C) {
currentDir := lm.Dir()
c.Assert(strings.HasSuffix(currentDir, cs.uuidWithSuffix), IsTrue)
}

func (r *testMetaSuite) TestLocalMetaPotentialDataRace(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could please use testify to rewrite this test? see #2164

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I'll submit another PR to rewrite the test

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 25, 2022
@lance6716
Copy link
Contributor

don't forget to sign cla #4455 (comment)

Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 26, 2022
@Ehco1996
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 9607554

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jan 26, 2022
@ti-chi-bot
Copy link
Member

@dsdashun: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 72c5fab into pingcap:master Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data race of R/W to relay status
6 participants