-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monero-wallet-cli: hang on exit in readline code #2117
Comments
Have you tried with #2112 ? Reason I ask is this has a process lock. |
No, I have just what's on master right now. It may well be the reason, thanks. |
Yes I'm sure thats it. #2112 fixes. |
I'm seeing a similar non fatal hang with 2112 applied. It's pretty often, and happens when I'm being asked for a password (when starting, when doing any command which requires one). The hang is only temporary though, but can last a minute. It looks like if I go to another shell to run pstack to see where it's at, this will get it unstuck and it displays its password prompt, but not always (and this is when I can get the stack trace below). Thread 4 (Thread 0x7f7a7bd99700 (LWP 5362)): |
It also happens at random places, for instance, a long pause on new wallet creation between the display of languages and the "Enter the number corresponding to the language of your choice: " prompt. |
1:20 pause here, from the log timestamps. I'll just disable it for now, it's infuriating when debugging stuff. |
Do you have your |
You mean the one to avoid the busy loop ? Yes, it's in master, and it's in the tree I was testing. |
Yes #2111. Just checking before I did into it. It's tough as I cannot replicate. Will try some other systems. |
Very odd. It's pretty much all the time here. Fedora on Xen, x86_64. |
Reason I thought maybe #2111 fixed it was because of seeing this in your stack:
But now I'm thinking maybe some kind of race. I've been testing on OS X 10.11.6 and Ubuntu 16 (Under VirtualBox). |
So a password prompt is correctly suspending readline processing:
So why is thread 3 at
What I see from this is that |
But then looking at your two stacktraces, is the first issue of exit, enter, fixed by #2122? |
I did not see the exit hang again, but then I had seen it only once before. |
Looking at the code right now, it looks like it could be that the stop thread never gets the lock, since the caller for select is:
So the thread never gets to enter any kind of wait while outside the lock. To reliably get rid of this, you need a condition variable in addition to the mutex. |
And by never, I mean for a long time. It'd also neatly explain the random seeming delays, and also why it often got "unstuck" when I ran pstack on it to see where it was... |
But in your backtrace |
When |
Ah ha. But yes the converse makes sense! The stop/exit thread can be blocked/waiting for a suspended readline_buffer. |
But then this state can only ever be present if readline is suspended, which is only when password prompting or in a multiline output message command. int rdln::readline_buffer::process()
{
std::unique_lock<std::mutex> lock(process_mutex);
if(m_cout_buf == NULL)
return 0;
return process_input();
} to: int rdln::readline_buffer::process()
{
if(!process_mutex.try_lock())
return 0;
if(m_cout_buf == NULL)
return 0;
int count = process_input();
process_mutex.unlock();
return count;
} This way, process() never blocks when RL is suspended. |
Did not help (I also added an unlock on the early return so it's not a deadlock due to that). |
If I add a yield() at the end of process, it's a lot better, though it can still wait a bit, so not 100% good. |
Hmm. Reading the code, rdln::readline_buffer::sync uses m_cout_buf, but does not seem to lock process_mutex, which protects m_cout_buf in start/stop. |
It's annoying. With a 1 millisecond sleep at the end of process (after unlock), it works reliably. Not sure why a yield isn't always enough, since it should let the start/stop lock the lock before the thread is scheduled again. Using a cond var does not work since it'd need to be locked already. Would you be fine with a sleep like this ? |
Or maybe a better solution is to remove that process_lock and find a more granular way to fix the previous problem ? |
This is the patch which fixes the bug here:
|
Also a bit of cleanup I found while looking at it:
They're based on a branch with your process_lock patch, so don't apply to master. |
Great. Got there in the end it seems! |
I guess. That'd lose my GPG signature, but those are small patches so should be fine to re-review once merged. |
@moneromooo-monero OK I added to #2112 and tested on OS X and Ubuntu quickly. |
readline_buffer: fix start/stop threads being starved by process process could run for quite some time re-acquiring the process lock, leaving start/stop starving. Yielding after unlock in process is much better but doesn't seem to be enough to reliably yield, so we sleep for a millisecond, which should be transparent for user input anyway. Signed-off-by: Jethro Grassie <[email protected]>
readline_buffer: move a local to local scope Also limit the select fd limit to what we use Signed-off-by: Jethro Grassie <[email protected]>
This now seems all fixed with the recent patches, closing. |
This has now happened again: Thread 6 (Thread 0x7f0577488700 (LWP 20037)): A quick look around shows line_mutex is used at only one location (to lock get_line), which looks very suspect. |
It is notified from code in |
I'm assuming you mean have_line, not line_mutex. line_mutex is used only in get_line (but it might be there to ensure only one thread can read a line at once ?) |
Thats correct. The mutex is there for thread safety and the condition_variable is used to block till we have a line of input. |
@moneromooo-monero is this also confirmed to be readline related? Asking because I have seen these exit hangs in the wallet without readline. |
The stack trace shows a thread waiting in readline::get_line. I also never had that happen before the readline patches (though it did happen in windows in the past). |
Can you try this patch please before I create a PR? It should deal with a waiting get_line while trying to stop. diff --git a/contrib/epee/src/readline_buffer.cpp b/contrib/epee/src/readline_buffer.cpp
index c846641b..53261132 100644
--- a/contrib/epee/src/readline_buffer.cpp
+++ b/contrib/epee/src/readline_buffer.cpp
@@ -59,6 +59,7 @@ void rdln::readline_buffer::start()
void rdln::readline_buffer::stop()
{
std::unique_lock<std::mutex> lock(process_mutex);
+ have_line.notify_all();
if(m_cout_buf == NULL)
return;
std::cout.rdbuf(m_cout_buf); |
I will, but at first glance, this is racy since get_line does not lock process_mutex, therefore might start just after that notify_all. |
get_line doesn't need to lock process_mutex though as it serves a separate purpose. |
I have tried the patch, and, unsurprisingly (since the hang occured just once after your previous patch), exiting works fine. |
* upstream/master: (33 commits) mlog: make MONERO_LOGS env var understand N,logs type spec daemon: fix status in command line mode if using restricted rpc miner: fix background mining options parsing Don't hardcode /tmp Fix monero-project#2164 histogram output Add various readline related fixes Add notification on stop mlog: add msgwriter:INFO to log 0 and 1 defaults tx_pool: initialize padding in txpool meta structure cryptonote_core: initialize checkpoint flag epee: Remove unused variable Fix issue monero-project#2119 SEGV simplewallet: lock idle scope when sweeping monero-wallet-cli: hang on exit in readline code (monero-project#2117) monero-wallet-cli: hang on exit in readline code (monero-project#2117) abstract_tcp_server2: guard against accessing lock on a destroyed object Remove typeid use in network_address enable monero build on ppc64le architecture Fix monero-project#2071: remove declaration of unused variable "it" in epee Minor cleanup: tab vs. space + logs messages ...
I've just had it again with latest master. One thread waiting for readline_buffer::get_line to end, and it doesn't, |
I've not seen this since hyc's threading simplification. +resolved |
After typing exit+enter. It is rare:
Thread 4 (Thread 0x7fc825a5a700 (LWP 5206)):
#0 0x00007fc829e9f590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000d22b79 in boost::condition_variable::wait(boost::unique_lockboost::mutex&) ()
#2 0x0000000000d3ab07 in epee::async_stdin_reader::wait_read() ()
#3 0x0000000000d3ad03 in epee::async_stdin_reader::reader_thread_func() ()
#4 0x0000000000e50947 in void std::_Mem_fn<void (epee::async_stdin_reader::)()>::operator()<, void>(epee::async_stdin_reader) const ()
#5 0x0000000000e4faa2 in void std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::)()> (epee::async_stdin_reader)>::__call<void, , 0
#6 0x0000000000e4de16 in void std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::)()> (epee::async_stdin_reader)>::operator()<, voi
#7 0x0000000000e48d24 in boost::detail::thread_data<std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::)()> (epee::async_stdin_reade
#8 0x00007fc82bd979da in thread_proxy () from /home/user/boost_1_59_install/lib/libboost_thread.so.1.59.0
#9 0x00007fc829e9a52a in start_thread () from /lib64/libpthread.so.0
#10 0x00007fc829bd622d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fc81ece6700 (LWP 5207)):
#0 0x00007fc829bccae3 in select () from /lib64/libc.so.6
#1 0x00000000011b040c in process_input() ()
#2 0x00000000011b025f in rdln::readline_buffer::process() ()
#3 0x0000000000d3acce in epee::async_stdin_reader::readline_thread_func() ()
#4 0x0000000000e50947 in void std::_Mem_fn<void (epee::async_stdin_reader::)()>::operator()<, void>(epee::async_stdin_reader*) const ()
#5 0x0000000000e4faa2 in void std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::)()> (epee::async_stdin_reader)>::__call<void, , 0
#6 0x0000000000e4de16 in void std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::)()> (epee::async_stdin_reader)>::operator()<, voi
#7 0x0000000000e48d24 in boost::detail::thread_data<std::_Bind<std::_Mem_fn<void (epee::async_stdin_reader::*)()> (epee::async_stdin_reade
#8 0x00007fc82bd979da in thread_proxy () from /home/user/boost_1_59_install/lib/libboost_thread.so.1.59.0
#9 0x00007fc829e9a52a in start_thread () from /lib64/libpthread.so.0
#10 0x00007fc829bd622d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fc81e0d7700 (LWP 5208)):
#0 0x00007fc829e9f939 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000d22c87 in boost::condition_variable::do_wait_until(boost::unique_lockboost::mutex&, timespec const&) ()
#2 0x0000000000d22607 in boost::condition_variable::wait_until(boost::unique_lockboost::mutex&, boost::chrono::time_point<boost::chrono:
#3 0x0000000000d501a7 in boost::cv_status boost::condition_variable::wait_for<long, boost::ratio<1l, 1l> >(boost::unique_lock<boost::mutex
#4 0x0000000000d0fd67 in cryptonote::simple_wallet::wallet_idle_thread() ()
#5 0x0000000000d0fdd3 in cryptonote::simple_wallet::run()::{lambda()#1}::operator()() const ()
#6 0x0000000000d1ed92 in boost::detail::thread_datacryptonote::simple_wallet::run()::{lambda()#1}::run() ()
#7 0x00007fc82bd979da in thread_proxy () from /home/user/boost_1_59_install/lib/libboost_thread.so.1.59.0
#8 0x00007fc829e9a52a in start_thread () from /lib64/libpthread.so.0
#9 0x00007fc829bd622d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fc82ce89840 (LWP 5205)):
#0 0x00007fc82c655480 in wcwidth@plt () from /lib64/libreadline.so.6
#1 0x00007fc82c676053 in _rl_find_next_mbchar () from /lib64/libreadline.so.6
#2 0x00007fc82c665635 in expand_prompt () from /lib64/libreadline.so.6
#3 0x00007fc82c6658f6 in rl_expand_prompt () from /lib64/libreadline.so.6
#4 0x00007fc82c6560ba in rl_set_prompt () from /lib64/libreadline.so.6
#5 0x00000000011b0232 in rdln::readline_buffer::set_prompt(std::string const&) ()
#6 0x0000000000d3af9e in epee::async_console_handler::print_prompt() ()
#7 0x0000000000d6038a in bool epee::async_console_handler::run<bool epee::async_console_handler::run<boost::_bi::bind_t<bool, boost::_mfi:
#8 0x0000000000d4a215 in bool epee::async_console_handler::run<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, epee::command_handler, std:
#9 0x0000000000d3b4c4 in epee::console_handlers_binder::run_handling(std::string const&, std::string const&, std::function<void ()>) ()
#10 0x0000000000d1007b in cryptonote::simple_wallet::run() ()
#11 0x0000000000d161e4 in main ()
The text was updated successfully, but these errors were encountered: