Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

worker, ha: For errors that can only be attempted a limited number of times, the number of attempts is limited #1396

Merged
merged 9 commits into from
Feb 4, 2021

Conversation

zeminzhou
Copy link
Contributor

@zeminzhou zeminzhou commented Jan 21, 2021

What problem does this PR solve?

worker recieved a bound watch, but failed to read bound information in etcd.[part of #1388] (#1388)

What is changed and how it works?

Add reading bound information failed error to RetryableError type, but limit the number of retry.

Check List

Tests

  • Unit test [WIP]
  • Integration test [WIP]

Code changes

  • Has exported function/method change

Side effects

Related changes

@CLAassistant
Copy link

CLAassistant commented Jan 21, 2021

CLA assistant check
All committers have signed the CLA.

@lance6716 lance6716 added needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated status/PTAL This PR is ready for review. Add this label back after committing new changes type/bug-fix Bug fix status/WIP This PR is still work in progress and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jan 22, 2021
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
select {
case <-s.ctx.Done():
return
case <-time.After(keepaliveTimeout):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep time should be related to s.cfg.KeepAliveTTL, for example, 2 * s.cfg.KeepAliveTTL

dm/worker/server.go Outdated Show resolved Hide resolved
@lance6716 lance6716 removed the status/WIP This PR is still work in progress label Jan 31, 2021
@lance6716 lance6716 added this to the v2.0.2 milestone Jan 31, 2021
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM. good job

dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server.go Outdated Show resolved Hide resolved
dm/worker/server_test.go Show resolved Hide resolved
dm/worker/server_test.go Outdated Show resolved Hide resolved
dm/worker/server_test.go Outdated Show resolved Hide resolved
dm/worker/server_test.go Outdated Show resolved Hide resolved
@lance6716 lance6716 changed the title For errors that can only be attempted a limited number of times, the number of attempts is limited worker, ha: For errors that can only be attempted a limited number of times, the number of attempts is limited Jan 31, 2021
@lance6716
Copy link
Collaborator

/lgtm

@ti-srebot ti-srebot added the status/LGT1 One reviewer already commented LGTM label Jan 31, 2021
@lance6716
Copy link
Collaborator

and please help check other TODO: handle fatal error from need this mechanism

@lance6716
Copy link
Collaborator

PTAL @3pointer @lichunzhu

@@ -110,7 +110,7 @@ func (s *Server) KeepAlive() {
return // return if failed to stop the worker.
}
select {
case <-s.ctx.Done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using another variable to save s.ctx here to avoid data race.

select {
case <-s.ctx.Done():
return
case <-time.After(time.Duration(s.cfg.KeepAliveTTL) * time.Second * 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start another keepalive only when we assure the last one is quitted? For example, using another waitgroup.

@lichunzhu
Copy link
Contributor

/lgtm

@ti-srebot ti-srebot removed the status/LGT1 One reviewer already commented LGTM label Feb 4, 2021
@ti-srebot ti-srebot added the status/LGT2 Two reviewers already commented LGTM, ready for merge label Feb 4, 2021
@ti-srebot
Copy link

cherry pick to release-2.0 in PR #1425

@ti-srebot ti-srebot added already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked and removed needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 labels Feb 4, 2021
lance6716 pushed a commit that referenced this pull request Feb 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked first-time-contributor needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated status/LGT2 Two reviewers already commented LGTM, ready for merge type/bug-fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants