Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br: rawkv backup failed #32607

Closed
pingyu opened this issue Feb 24, 2022 · 1 comment · Fixed by #32612
Closed

br: rawkv backup failed #32607

pingyu opened this issue Feb 24, 2022 · 1 comment · Fixed by #32612
Assignees
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. component/br This issue is related to BR of TiDB. severity/major type/bug The issue is confirmed as a bug.

Comments

@pingyu
Copy link
Contributor

pingyu commented Feb 24, 2022

Bug Report

Background

An user reported a failure of BR backup rawkv (ONCALL-4497) as following:

Backup rawkv by executing:

./br backup raw --pd xxx -s "local:///data/backup" --ratelimit 128 --cf default

It failed and printed:

["Raw backup failed summary"] ... [errorVerbose="
...
github.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup
    github.com/pingcap/br@/pkg/backup/client.go:738
...

TiKV panic with log:

[FATAL] ... ["called `Result::unwrap()` on an `Err` value: KeyPadding"] [backtrace="
...
core::result::Result<T,E>::unwrap
tikv/components/backup/src/endpoint.rs:714
...

TiKV panic at https://github.com/tikv/tikv/blob/dbb3d3a7ea092bb3c7442c54bed19987caa26bd1/components/backup/src/endpoint.rs#L714, a branch should enter with is_raw_kv = false.

BR error printed at https://github.com/pingcap/br/blob/v5.0.4/pkg/backup/client.go#L738. In which the procedure fineGrainedBackup entered when part of the range is not completed in push.pushBackup.

It would be inevitable for push.pushBackup to not complete the whole range if some errors (e.g. region error) happen during backup.
So I try to reproduce the failure by ingest such an error in push.pushBackup.

1. Minimal reproduce step (Required)

  1. Ingest a region error in push.pushBackup (see the code), to enter fineGrainedBackup.
  2. Setup TiKV cluster: tiup playground nightly --db 0 --tiflash 0 --kv 3 --pd 1
  3. Import 4000000 x 1KB records by raw_put of client-go. There are about 40 regions after import.
  4. Run BR: tidb/bin/br backup raw --pd $PD -s "local:///disk1/home/pingyu/workspace/brdata" --cf default

2. What did you expect to see? (Required)

Backup successfully without error.

3. What did you see instead (Required)

BR stuck at 78.57%.

$ tidb/bin/br backup raw --pd $PD -s "local:///xxx" --cf default
Detail BR log in /tmp/br.log.2022-02-24T14.52.31+0800
Raw backup <------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------/.........................................................................> 78.57%

All TiKV panic with log:

[2022/02/24 14:52:34.687 +08:00] [FATAL] [lib.rs:466] ["called `Result::unwrap()` on an `Err` value: KeyPadding"] [backtrace="   0: tikv_util::set_panic_hook::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:465:18
   1: std::panicking::rust_panic_with_hook
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:702:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:588:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/sys_common/backtrace.rs:138:18
   4: rust_begin_unwind
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:584:5
   5: core::panicking::panic_fmt
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/panicking.rs:143:14
   6: core::result::unwrap_failed
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/result.rs:1749:5
   7: core::result::Result<T,E>::unwrap
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/result.rs:1065:23
      backup::endpoint::BackupRange::backup::{{closure}}::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/backup/src/endpoint.rs:405:44
      core::option::Option<T>::map_or_else
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/option.rs:999:24
      backup::endpoint::BackupRange::backup::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/backup/src/endpoint.rs:402:22
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/future/mod.rs:84:19
      backup::endpoint::Endpoint<E,R>::spawn_backup_worker::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/backup/src/endpoint.rs:888:30
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/future/mod.rs:84:19
      backup::utils::DaemonRuntime::spawn::{{closure}}
             at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/backup/src/utils.rs:20:14
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/future/mod.rs:84:19
      tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/core.rs:161:17
      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/loom/std/unsafe_cell.rs:14:9
      tokio::runtime::task::core::CoreStage<T>::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/core.rs:151:13
      tokio::runtime::task::harness::poll_future::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:461:19
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panic.rs:137:14
      tokio::runtime::task::harness::poll_future
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:449:18
  10: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:98:27
      tokio::runtime::task::harness::Harness<T,S>::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:53:15
      tokio::runtime::task::raw::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/raw.rs:113:5
  11: tokio::runtime::task::raw::RawTask::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/raw.rs:70:18
      tokio::runtime::task::LocalNotified<S>::run
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/mod.rs:343:9
      tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:420:13
      tokio::coop::with_budget::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/coop.rs:106:9
      std::thread::local::LocalKey<T>::try_with
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/thread/local.rs:413:16
      std::thread::local::LocalKey<T>::with
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/thread/local.rs:389:9
      tokio::coop::with_budget
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/coop.rs:99:5
      tokio::coop::budget
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/coop.rs:76:5
      tokio::runtime::thread_pool::worker::Context::run_task
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:419:9
  12: tokio::runtime::thread_pool::worker::Context::run
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:386:24
      tokio::runtime::thread_pool::worker::run::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:371:17
      tokio::macros::scoped_tls::ScopedKey<T>::set
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/macros/scoped_tls.rs:61:9
      tokio::runtime::thread_pool::worker::run
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:368:5
      tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/thread_pool/worker.rs:347:45
      <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/blocking/task.rs:42:21
      tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/core.rs:161:17
      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/loom/std/unsafe_cell.rs:14:9
      tokio::runtime::task::core::CoreStage<T>::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/core.rs:151:13
      tokio::runtime::task::harness::poll_future::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:461:19
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panic.rs:137:14
      tokio::runtime::task::harness::poll_future
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:449:18
      tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:98:27
      tokio::runtime::task::harness::Harness<T,S>::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/harness.rs:53:15
      tokio::runtime::task::raw::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/raw.rs:113:5
  13: tokio::runtime::task::raw::RawTask::poll
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/raw.rs:70:18
      tokio::runtime::task::UnownedTask<S>::run
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/task/mod.rs:379:9
      tokio::runtime::blocking::pool::Inner::run
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/blocking/pool.rs:265:17
      tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-1.12.0/src/runtime/blocking/pool.rs:245:17
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/sys_common/backtrace.rs:122:18
  14: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/thread/mod.rs:498:17
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/panic.rs:137:14
      std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/thread/mod.rs:497:30
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/core/src/ops/function.rs:227:5
  15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/alloc/src/boxed.rs:1854:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/alloc/src/boxed.rs:1854:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/1e12aef3fab243407f9d71ba9956cb2a1bf105d5/library/std/src/sys/unix/thread.rs:108:17
  16: start_thread
  17: clone
"] [location=/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/backup/src/endpoint.rs:405] [thread_name=bkwkr]

4. What is your TiDB version? (Required)

TiDB: N/A
BR:

$ tidb/bin/br -V
Release Version: v5.5.0-alpha-325-g5bbd70991-dirty
Git Commit Hash: 5bbd7099143e840636486ea6940c576086942409
Git Branch: master
Go Version: go1.16.10
UTC Build Time: 2022-02-24 06:41:51
Race Enabled: false

TiKV:

$ ~/.tiup/components/tikv/v5.5.0-nightly-20220220/tikv-server --version
TiKV
Release Version:   5.4.0-alpha
Edition:           Community
Git Commit Hash:   b1b470f9baa0e0b4acee10229676ce73167bf9a2
Git Commit Branch: heads/refs/tags/v5.5.0-alpha
UTC Build Time:    2022-02-18 14:53:57
Rust Version:      rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
Enable Features:   jemalloc mem-profiling portable sse test-engines-rocksdb cloud-aws cloud-gcp cloud-azure
Profile:           dist_release
@pingyu pingyu added the type/bug The issue is confirmed as a bug. label Feb 24, 2022
@pingyu
Copy link
Contributor Author

pingyu commented Feb 24, 2022

/assign

pingyu added a commit to pingyu/tidb that referenced this issue Feb 24, 2022
pingyu added a commit to pingyu/tidb that referenced this issue Feb 24, 2022
Issue Number: close pingcap#32607

Signed-off-by: pingyu <[email protected]>
@fubinzh fubinzh added component/br This issue is related to BR of TiDB. severity/major labels Feb 24, 2022
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. labels Feb 24, 2022
@3pointer 3pointer added affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Feb 28, 2022
pingyu added a commit to pingyu/tidb that referenced this issue Mar 1, 2022
Issue Number: close pingcap#32607

Signed-off-by: pingyu <[email protected]>
ti-chi-bot pushed a commit that referenced this issue Mar 3, 2022
pingyu added a commit to ti-srebot/tidb that referenced this issue Apr 12, 2022
Issue Number: ref pingcap#32607

Signed-off-by: pingyu <[email protected]>
pingyu added a commit to ti-srebot/tidb that referenced this issue Apr 12, 2022
pingyu added a commit to ti-srebot/tidb that referenced this issue Apr 13, 2022
ti-chi-bot pushed a commit that referenced this issue Apr 15, 2022
ti-chi-bot pushed a commit that referenced this issue Apr 28, 2022
ti-chi-bot pushed a commit that referenced this issue Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. component/br This issue is related to BR of TiDB. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants