Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture(ticdc): fix the problem that openapi is blocked when pd is abnormal #4788

Merged
merged 19 commits into from
Apr 2, 2022

Conversation

CharlesCheung96
Copy link
Contributor

@CharlesCheung96 CharlesCheung96 commented Mar 7, 2022

What problem does this PR solve?

Issue Number: close #4778

What is changed and how it works?

  1. Avoid holding a mutex lock in a blocking operation.
  2. Add timeout control for EtcdClient get and put operations to prevent possible goroutine leak in openapi.
    rZvF5Nx4BM
  3. Remove redundant retry operations in (*openApi).forwardToOwner

Check List

Tests

  • Unit test

Related changes

  • Need to cherry-pick to the release branch

Release note

`Fix a bug that openapi may be stuck when pd is abnormal`

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 7, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • amyangfei
  • overvenus

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 7, 2022
@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from 9ec5aba to 095421b Compare March 7, 2022 06:41
@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2022

Codecov Report

Merging #4788 (660b247) into master (9607554) will increase coverage by 0.3747%.
The diff coverage is 69.3430%.

Flag Coverage Δ
cdc 60.2913% <69.3430%> (+0.3691%) ⬆️
dm 52.4556% <ø> (+0.4267%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4788        +/-   ##
================================================
+ Coverage   55.6402%   56.0149%   +0.3746%     
================================================
  Files           494        524        +30     
  Lines         61283      65886      +4603     
================================================
+ Hits          34098      36906      +2808     
- Misses        23750      25379      +1629     
- Partials       3435       3601       +166     

@@ -118,9 +118,17 @@ func NewCapture4Test(o owner.Owner) *Capture {
}

func (c *Capture) reset(ctx context.Context) error {
conf := config.GetGlobalServerConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need ut cover the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL

@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. labels Mar 8, 2022
@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from d2a4eda to 206852d Compare March 9, 2022 09:07
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 9, 2022
@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from 206852d to ac9775c Compare March 9, 2022 09:07
@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

1 similar comment
@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from a776844 to 128e768 Compare March 11, 2022 03:56
@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from 933f9a0 to cec8e43 Compare March 11, 2022 04:15
@@ -149,8 +158,10 @@ func (c *Client) Txn(ctx context.Context, cmps []clientv3.Cmp, opsThen, opsElse
// Grant delegates request to clientv3.Lease.Grant
func (c *Client) Grant(ctx context.Context, ttl int64) (resp *clientv3.LeaseGrantResponse, err error) {
err = retryRPC(EtcdGrant, c.metrics[EtcdGrant], func() error {
grantCtx, cancel := context.WithTimeout(ctx, etcdClientTimeoutWithoutRetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use etcdClientTimeoutWithRetry here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure that it returns in a reasonable amount of time, otherwise the operation may never retry due to blocking.

@@ -181,8 +192,10 @@ func isRetryableError(rpcName string) retry.IsRetryable {
// Revoke delegates request to clientv3.Lease.Revoke
func (c *Client) Revoke(ctx context.Context, id clientv3.LeaseID) (resp *clientv3.LeaseRevokeResponse, err error) {
err = retryRPC(EtcdRevoke, c.metrics[EtcdRevoke], func() error {
revokeCtx, cancel := context.WithTimeout(ctx, etcdClientTimeoutWithoutRetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -191,8 +204,10 @@ func (c *Client) Revoke(ctx context.Context, id clientv3.LeaseID) (resp *clientv
// TimeToLive delegates request to clientv3.Lease.TimeToLive
func (c *Client) TimeToLive(ctx context.Context, lease clientv3.LeaseID, opts ...clientv3.LeaseOption) (resp *clientv3.LeaseTimeToLiveResponse, err error) {
err = retryRPC(EtcdRevoke, c.metrics[EtcdRevoke], func() error {
timeToLiveCtx, cancel := context.WithTimeout(ctx, etcdClientTimeoutWithoutRetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 26 to 27
goleak.IgnoreTopFunction("google.golang.org/grpc.(*ccBalancerWrapper).watcher"),
goleak.IgnoreTopFunction("google.golang.org/grpc.(*addrConn).resetTransport"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't the gRPC goroutines exit in test case?

cdc/capture/capture_test.go Outdated Show resolved Hide resolved
@@ -605,6 +606,8 @@ func (c *Capture) WriteDebugInfo(ctx context.Context, w io.Writer) {
if c.processorManager != nil {
fmt.Fprintf(w, "\n\n*** processors info ***:\n\n")
c.processorManager.WriteDebugInfo(ctx, w, doneM)
} else {
close(doneM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an existing bug, how can it be reproduced?

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2022
@CharlesCheung96 CharlesCheung96 force-pushed the fix_4778_openapi_stuck branch from 43b8403 to cb3035e Compare March 14, 2022 01:33
@ti-chi-bot ti-chi-bot merged commit 7fb7097 into pingcap:master Apr 2, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Apr 2, 2022
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5109.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5110.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5111.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5112.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5113.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5114.

CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 14, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 14, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 14, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 14, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 14, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 22, 2022
CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this pull request Apr 28, 2022
CharlesCheung96 pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/api HTTP API. needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Querying status via openapi may be blocked when pd fails
6 participants