Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copr: allow checksum request to be rescheduled (#9094) #9098

Merged
merged 4 commits into from
Nov 26, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #9094 to release-4.0


What problem does this PR solve?

Issue Number: close pingcap/br#611

Problem Summary:

Running the checksum coprocessor, even with concurrency = 1, affects the cluster performance since it never yields control.

What is changed and how it works?

What's Changed:

Copied the "reschedule" mechanism from the "analyze" coprocessor to the checksum coprocessor.

Related changes

  • Need to cherry-pick to the release branch
    • 3.0, 4.0

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Execute BR backup while a SELECT is running in background. The following shows the TiDB QPS measurement before and after applying this PR:
      • Green = (Negation of) number of keys processed by the "checksum" coprocessor, non-zero means a checksum is running
      • Yellow = QPS, higher means the background SELECT is less affected
      • Left 3 = before this PR was applied, QPS is dropped to 30% of original
      • Right 1 = after this PR is applied, QPS is almost unaffected.

Side effects

Release note

  • Running checksum in BR and Lightning should have less influence on the cluster performance.

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot added sig/coprocessor SIG: Coprocessor status/PTAL Status: Waiting for reviewing type/bugfix This PR fixes a bug. type/cherry-pick Type: PR - Cherry pick labels Nov 24, 2020
@ti-srebot ti-srebot added this to the v4.0.9 milestone Nov 24, 2020
@ti-srebot
Copy link
Contributor Author

@kennytm you're already a collaborator in bot's repo.

@kennytm
Copy link
Contributor

kennytm commented Nov 24, 2020

/test

https://internal.pingcap.net/idc-jenkins/blue/rest/organizations/jenkins/pipelines/tikv_ghpr_test/runs/36024/nodes/212/log/?start=0

[2020-11-24T09:56:29.341Z] test raftstore::test_region_heartbeat::test_server_pending_peers ... thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Other("[components/test_raftstore/src/cluster.rs:331]: can\'t get leader of region 1")', src/libcore/result.rs:1188:5
[2020-11-24T09:56:29.341Z] stack backtrace:
[2020-11-24T09:56:29.341Z]   14: core::result::Result<T,E>::unwrap
[2020-11-24T09:56:29.341Z]              at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/result.rs:956
[2020-11-24T09:56:29.341Z]   15: test_raftstore::cluster::Cluster<T>::request
[2020-11-24T09:56:29.341Z]              at ./components/test_raftstore/src/cluster.rs:671
[2020-11-24T09:56:29.341Z]   16: test_raftstore::cluster::Cluster<T>::must_put_cf
[2020-11-24T09:56:29.341Z]              at ./components/test_raftstore/src/cluster.rs:816
[2020-11-24T09:56:29.341Z]   17: test_raftstore::cluster::Cluster<T>::must_put
[2020-11-24T09:56:29.341Z]              at ./components/test_raftstore/src/cluster.rs:812
[2020-11-24T09:56:29.341Z]   18: integrations::raftstore::test_region_heartbeat::test_pending_peers
[2020-11-24T09:56:29.341Z]              at tests/integrations/raftstore/test_region_heartbeat.rs:128
[2020-11-24T09:56:29.341Z]   19: integrations::raftstore::test_region_heartbeat::test_server_pending_peers
[2020-11-24T09:56:29.341Z]              at tests/integrations/raftstore/test_region_heartbeat.rs:157
[2020-11-24T09:56:29.341Z]   20: integrations::raftstore::test_region_heartbeat::test_server_pending_peers::{{closure}}
[2020-11-24T09:56:29.341Z]              at tests/integrations/raftstore/test_region_heartbeat.rs:155
[2020-11-24T09:56:29.342Z] note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2020-11-24T09:56:29.342Z] FAILED

@zhouqiang-cl
Copy link
Contributor

/run-all-tests

1 similar comment
@Little-Wallace
Copy link
Contributor

/run-all-tests

@sticnarf
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 26, 2020
Copy link
Contributor

@Little-Wallace Little-Wallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot
Copy link
Contributor Author

@Little-Wallace, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. See the corresponding SIG page for more information. Related SIG: coprocessor(slack).

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 26, 2020
@BusyJay BusyJay added the status/can-merge Indicates a PR has been approved by a committer. label Nov 26, 2020
@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 9125

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot merged commit 9dcfe32 into tikv:release-4.0 Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/coprocessor SIG: Coprocessor status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. status/PTAL Status: Waiting for reviewing type/bugfix This PR fixes a bug. type/cherry-pick Type: PR - Cherry pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants