Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: make Backoff perceive the Killed flag to fix MAX_EXECUTION_TIME #14552

Merged
merged 4 commits into from
Feb 3, 2020

Conversation

tiancaiamao
Copy link
Contributor

@tiancaiamao tiancaiamao commented Jan 20, 2020

What problem does this PR solve?

Previously we check the killed flag in each executor.Next() call, but if the query current executing Backoff(), we can't terminate the query.

In some cases such as TiKV is crash/recovering, the backoff would last for a long time.
And kill the query doesn't work because the killed flag is not checked during backoff.

Similar issue: #12852

What is changed and how it works?

Pass the killed flag to backoffer and check the value during backoff.

Check List

Tests

  • Unit test

Release note

  • Fix a bug MAX_EXECUTION_TIME doesn't take effect in some cases

Previously we check the killed flag in each executor.Next() call.
But if the query current executing Backoff(), we can't terminate the query.
@tiancaiamao tiancaiamao added type/enhancement The issue or PR belongs to an enhancement. component/server component/tikv and removed component/tikv labels Jan 20, 2020
@tiancaiamao
Copy link
Contributor Author

PTAL @jackysp @lonng

@tiancaiamao
Copy link
Contributor Author

[2020-01-20T07:25:32.212Z] WARNING: DATA RACE
[2020-01-20T07:25:32.212Z] Read at 0x00c0021be740 by goroutine 130:
[2020-01-20T07:25:32.212Z]   github.com/pingcap/tidb/store/tikv/gcworker.(*mergeLockScanner).Start.func1()
[2020-01-20T07:25:32.212Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker.go:1813 +0x208
[2020-01-20T07:25:32.212Z] 
[2020-01-20T07:25:32.212Z] Previous write at 0x00c0021be740 by goroutine 107:
[2020-01-20T07:25:32.212Z]   github.com/pingcap/tidb/store/tikv/gcworker.(*mergeLockScanner).Start()
[2020-01-20T07:25:32.212Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker.go:1803 +0x202
[2020-01-20T07:25:32.212Z]   github.com/pingcap/tidb/store/tikv/gcworker.(*testGCWorkerSuite).makeMergedMockClient.func2()
[2020-01-20T07:25:32.212Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker_test.go:925 +0x78

CI fixed in #14554

@tiancaiamao
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp jackysp added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 20, 2020
@imtbkcat
Copy link

there is a race at TestKillFlagInBackoff @tiancaiamao

@bb7133 bb7133 added this to the v3.1.0-rc milestone Jan 20, 2020
@bb7133
Copy link
Member

bb7133 commented Jan 20, 2020

hi @tiancaiamao , would you please update the release note in your PR description? Thanks.

@bb7133 bb7133 modified the milestones: v3.1.0-rc, v3.0.10 Jan 20, 2020
@tiancaiamao
Copy link
Contributor Author

/run-all-tests

@tiancaiamao tiancaiamao added type/bugfix This PR fixes a bug. and removed type/enhancement The issue or PR belongs to an enhancement. labels Feb 3, 2020
@tiancaiamao tiancaiamao changed the title *: make Backoff perceive the Killed flag *: make Backoff perceive the Killed flag to fix MAX_EXECUTION_TIME Feb 3, 2020
@tiancaiamao
Copy link
Contributor Author

I've fixed the data race in the unit test and updated the "Release note" description.
PTAL @imtbkcat @bb7133

Copy link

@imtbkcat imtbkcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp
Copy link
Member

jackysp commented Feb 3, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 3, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Feb 3, 2020

/run-all-tests

@sre-bot sre-bot merged commit 97a049a into pingcap:master Feb 3, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Feb 3, 2020

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Apr 7, 2020

It seems that, not for sure, we failed to cherry-pick this commit to release-3.0. Please comment '/run-cherry-picker' to try to trigger the cherry-picker if we did fail to cherry-pick this commit before. @tiancaiamao PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/server status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants