Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open/ read stuck after warmup #1854

Closed
YunhuiChen opened this issue Aug 23, 2022 · 5 comments
Closed

open/ read stuck after warmup #1854

YunhuiChen opened this issue Aug 23, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@YunhuiChen
Copy link
Contributor

YunhuiChen commented Aug 23, 2022

Describe the bug (描述bug)
open/ read stuck after warmup

To Reproduce (复现方法)
mount fs to test5 and test6
1、fio --name=big-file-multi-write-2 --directory=/home/nbs/failover/test5 --rw=write --refill_buffers --bs=1M --size=50G --numjobs=1 --end_fsync=1 -fallocate=none
2、mv big-file-multi-write-2.0.0
3、sudo curve fs warmup add /home/nbs/failover/test6/warmuptest/
4、touch renametest
5、echo 111 > renametest
6、read stuck at test6

image

Expected behavior (期望行为)

Versions (各种版本)
OS:
Compiler:
branch:
commit id:

Additional context/screenshots (更多上下文/截图)

@YunhuiChen YunhuiChen added the bug Something isn't working label Aug 23, 2022
@YunhuiChen YunhuiChen added this to the Curve-2.4.0-beta milestone Aug 23, 2022
@YunhuiChen YunhuiChen changed the title sdk read stuck after warmup read stuck after warmup Aug 23, 2022
@YunhuiChen YunhuiChen changed the title read stuck after warmup open/ read stuck after warmup Aug 23, 2022
@cw123
Copy link
Contributor

cw123 commented Aug 24, 2022

image
image
If ReadFromS3 read data from s3 fail, and the error code is s3obj not exist, it will retry forever and stuck at mount point.

@h0hmj
Copy link
Contributor

h0hmj commented Aug 29, 2022

the main loop of s3 sdk was stuck by AsyncRequestInflightBytesThrottle

main loop:

  1. make_request, bytes = base + size1 < throttle
  2. request failed (maybe 404), goto callback
  3. call back phase 1: bytes = base - size1 = base, then notify all
    ( another stuck request get the chance, bytes = base + size2)
  4. call back phase 2: bytes = base + size1 + size2 > throttle
  5. request in cb was stuck, and stuck the main loop
Thread 26 (Thread 0x7f79f55ff700 (LWP 1972183)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f7a0ba7e50c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000055c2c9bdd26b in curve::common::S3Adapter::AsyncRequestInflightBytesThrottle::OnStart (this=0x7f79fc8d5a90, len=4194304)
    at src/common/s3_adapter.cpp:675
#3  0x000055c2c9be2638 in curve::common::S3Adapter::GetObjectAsync (this=0x7f7a06a0d590,
    context=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0) at src/common/s3_adapter.cpp:476
#4  0x000055c2c9ade7ea in curvefs::client::S3ClientImpl::DownloadAsync (this=0x7f79fc874910,
    context=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0) at curvefs/src/client/s3/client_s3.cpp:83
#5  0x000055c2c9aa871e in curvefs::client::FuseS3Client::<lambda(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&)>::operator() (adapter=<optimized out>, context=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0,
    __closure=0x7f79e49c8920) at curvefs/src/client/fuse_s3_client.cpp:283
#6  std::_Function_handler<void(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&), curvefs::client::FuseS3Client::WarmUpAllObjs(const std::__cxx11::list<std::pair<std::__cxx11::basic_string<char>, long unsigned int> >&)::<lambda(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&)> >::_M_invoke(const std::_Any_data &, <unknown type in /curvefs/client/sbin/curve-fuse.dbg-sym, CU 0x318495, DIE 0x3c74d5>, const std::shared_ptr<curve::common::GetObjectAsyncContext> &) (__functor=..., __args#0=<optimized out>, __args#1=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0)
    at /usr/include/c++/6/functional:1731
#7  0x000055c2c9bde622 in std::function<void (curve::common::S3Adapter const*, std::shared_ptr<curve::common::GetObjectAsyncContext> const&)>::operator()(curve::common::S3Adapter const*, std::shared_ptr<curve::common::GetObjectAsyncContext> const&) const (
    __args#1=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0, __args#0=<optimized out>, this=<optimized out>)
    at /usr/include/c++/6/functional:2127
#8  curve::common::S3Adapter::<lambda(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&)>::operator() (ctx=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0, __closure=0x7f79e48af6f0) at src/common/s3_adapter.cpp:449
#9  std::_Function_handler<void(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&), curve::common::S3Adapter::GetObjectAsync(std::shared_ptr<curve::common::GetObjectAsyncContext>)::<lambda(const curve::common::S3Adapter*, const std::shared_ptr<curve::common::GetObjectAsyncContext>&)> >::_M_invoke(const std::_Any_data &, <unknown type in /curvefs/client/sbin/curve-fuse.dbg-sym, CU 0x18b8e99, DIE 0x190a2d5>, const std::shared_ptr<curve::common::GetObjectAsyncContext> &) (__functor=...,
    __args#0=<optimized out>, __args#1=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0) at /usr/include/c++/6/functional:1731
#10 0x000055c2c9bdeb1c in std::function<void (curve::common::S3Adapter const*, std::shared_ptr<curve::common::GetObjectAsyncContext> const&)>::operator()(curve::common::S3Adapter const*, std::shared_ptr<curve::common::GetObjectAsyncContext> const&) const (
    __args#1=std::shared_ptr (count 6, weak 0) 0x7f79e48420b0, __args#0=<optimized out>, this=<optimized out>)
    at /usr/include/c++/6/functional:2127
#11 curve::common::S3Adapter::<lambda(const Aws::S3Crt::S3CrtClient*, const Aws::S3Crt::Model::GetObjectRequest&, const GetObjectOutcome&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&)>::operator() (awsCtx=..., response=..., __closure=0x7f79e4842148)
    at src/common/s3_adapter.cpp:469
#12 std::_Function_handler<void(const Aws::S3Crt::S3CrtClient*, const Aws::S3Crt::Model::GetObjectRequest&, Aws::Utils::Outcome<Aws::S3Crt::Model::GetObjectResult, Aws::S3Crt::S3CrtError>, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&), curve::common::S3Adapter::GetObjectAsync(std::shared_ptr<curve::common::GetObjectAsyncContext>)::<lambda(const Aws::S3Crt::S3CrtClient*, const Aws::S3Crt::Model::GetObjectRequest&, const GetObjectOutcome&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&)> >::_M_invoke(const std::_Any_data &, <unknown type in /curvefs/client/sbin/curve-fuse.dbg-sym, CU 0x18b8e99, DIE 0x1901aaf>, const Aws::S3Crt::Model::GetObjectRequest &, <unknown type in /curvefs/client/sbin/curve-fuse.dbg-sym, CU 0x18b8e99, DIE 0x1901ace>, const std::shared_ptr<Aws::Client::AsyncCallerContext const> &) (__functor=..., __args#0=<optimized out>, __args#1=...,
    __args#2=<unknown type in /curvefs/client/sbin/curve-fuse.dbg-sym, CU 0x18b8e99, DIE 0x1901ace>, __args#3=...)
    at /usr/include/c++/6/functional:1731
#13 0x00007f7a0d57d80e in GetObjectRequestShutdownCallback(void*) () from /usr/lib/libaws-cpp-sdk-s3-crt.so
#14 0x00007f7a0d5c73bd in s_s3_meta_request_destroy () from /usr/lib/libaws-cpp-sdk-s3-crt.so
#15 0x00007f7a0dc33e7a in aws_ref_count_release () from /usr/lib/libaws-cpp-sdk-core.so
#16 0x00007f7a0d5c2857 in aws_s3_client_update_meta_requests_threaded () from /usr/lib/libaws-cpp-sdk-s3-crt.so
#17 0x00007f7a0d5c31c6 in s_s3_client_process_work_default () from /usr/lib/libaws-cpp-sdk-s3-crt.so
#18 0x00007f7a0dc345fe in s_run_all () from /usr/lib/libaws-cpp-sdk-core.so
#19 0x00007f7a0da40df7 in s_main_loop () from /usr/lib/libaws-cpp-sdk-core.so
#20 0x00007f7a0dc38573 in thread_fn () from /usr/lib/libaws-cpp-sdk-core.so
#21 0x00007f7a0cd854a4 in start_thread (arg=0x7f79f55ff700) at pthread_create.c:456
#22 0x00007f7a0b4fcd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

@h0hmj
Copy link
Contributor

h0hmj commented Aug 29, 2022

we need create a queue in s3adapter to handle these retry requests. make throttle check outside the main loop to avoid deadlock

@h0hmj
Copy link
Contributor

h0hmj commented Sep 20, 2022

calling throttle OnStart may deadlock, client code should be rewritten to satisfy the principle of async programming

@Cyber-SiKu
Copy link
Contributor

Cyber-SiKu commented Jan 12, 2023

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants