Inconsistent async_read behaviour after 2.4.3 #17

willxinc · 2017-07-01T13:21:56Z

On versions 2.4.2 and before async_read would always call the callback with read_result.sucess == false after the read is finished.

After 2.4.2, this no longer appears to always be the case, in fact, after the client has finished sending all its info and disconnected, some of the time the callback won't be triggered at all.

Tested on Visual Studio 2017.

Cylix · 2017-07-02T07:51:35Z

Hi,

Thanks for reporting this issue.
This is an unexpected behavior that might indeed have appeared with the change submitted in v2.4.3!

The change done on v2.4.3 aimed to reduce CPU consumption on unix, but I still did a change on windows just in case.

void
io_service::process_rd_event(const fd_t& fd, tracked_socket& socket) {
  __TACOPIE_LOG(debug, "processing read event");

  auto rd_callback = socket.rd_callback;

  socket.is_executing_rd_callback = true;

  m_callback_workers << [=] {
    __TACOPIE_LOG(debug, "execute read callback");
    rd_callback(fd);

    std::lock_guard<std::mutex> lock(m_tracked_sockets_mtx);
    auto it = m_tracked_sockets.find(fd);

    if (it == m_tracked_sockets.end()) { return; }

    auto& socket                    = it->second;
    socket.is_executing_rd_callback = false;

    if (socket.marked_for_untrack && !socket.is_executing_wr_callback) {
      __TACOPIE_LOG(debug, "untrack socket");
      m_tracked_sockets.erase(it);
      m_wait_for_removal_condvar.notify_all();
+      m_notifier.notify();
    }
-      m_notifier.notify();
  };
}

I made a similar change for the process_wr_event.

At first sight, I thought it would have no impact.
Unfortunately, it has!
Indeed, the read_callback will start to execute. But if the read callback is executing, the socket is not set in the fd sets watched by select in the next select iteration.
Thus, select will sleep undefinitely.
Before, the call to m_notifier.notify() ensured that select would be waken up after the execution of any read/write callback, leading to reseting the fd sets and detect new event.
Now, it fails to wake up select that keep sleeping and do not detect new events.

Thanks a lot for reporting this issue which is indeed a bug, I truly appreciate!

I will submit a fix by tomorrow night (simply reverting the unnecessary buggy change).

Best!

willxinc · 2017-07-02T17:34:29Z

The odd thing is that even though it fails to wake, sometimes I'm seeing the event still fire.

Regardless, thanks for the prompt response.

Cylix · 2017-07-02T18:59:10Z

Yeah, this is normal: if the callback finishes to execute before the next setup/call to select, it will work.

Using this assumption, I was able to reproduce the bug really easily by using tacopie itself without cpp_redis:

#include <tacopie/tacopie>

#include <condition_variable>
#include <iostream>
#include <mutex>
#include <signal.h>

#ifdef _WIN32
#include <Winsock2.h>
#endif /* _WIN32 */

std::condition_variable cv;

void
signint_handler(int) {
  cv.notify_all();
}

void
on_new_message(tacopie::tcp_client& client, const tacopie::tcp_client::read_result& res) {
  if (res.success) {
    std::cout << "Client recv data" << std::endl;
    client.async_write({res.buffer, nullptr});
    client.async_read({1024, std::bind(&on_new_message, std::ref(client), std::placeholders::_1)});
    std::this_thread::sleep_for(std::chrono::seconds(2));
  }
  else {
    std::cout << "Client disconnected" << std::endl;
    client.disconnect();
  }
}

int
main(void) {
#ifdef _WIN32
  //! Windows netword DLL init
  WORD version = MAKEWORD(2, 2);
  WSADATA data;

  if (WSAStartup(version, &data) != 0) {
    std::cerr << "WSAStartup() failure" << std::endl;
    return -1;
  }
#endif /* _WIN32 */

  tacopie::tcp_client client;
  client.connect("127.0.0.1", 3001);
  client.async_read({1024, std::bind(&on_new_message, std::ref(client), std::placeholders::_1)});

  signal(SIGINT, &signint_handler);

  std::mutex mtx;
  std::unique_lock<std::mutex> lock(mtx);
  cv.wait(lock);

#ifdef _WIN32
  WSACleanup();
#endif /* _WIN32 */

  return 0;
}

This is just a slightly modified version of the tacopie example where I introduced a 2 second sleep in the read callback.

I can't reproduce it with cpp_redis though. My guess is that all the publish responses arrive too quick and at the same time on my computer, resulting in only one async_read necessary.

Cylix · 2017-07-02T19:31:55Z

I published a fix in v2.4.4.
I closed this issue, but feel free to re-open it if the bug is still there or if you have any other issues to report!

Best

LazyPlanet · 2017-11-19T15:16:11Z

Multi thread get
_client.sync_commit(std::chrono::milliseconds(1000));

#0 0x00007fbc25bcb6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fbc259699ec in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /lib64/libstdc++.so.6
#2 0x000000000053c20d in std::condition_variable::waitstd::__future_base::_State_base::wait()::{lambda()#1}(std::unique_lockstd::mutex&, std::__future_base::_State_base::wait()::{lambda()#1}) (this=0x7fbc00afb070, __lock=..., __p=...) at /usr/include/c++/4.8.2/condition_variable:93
#3 0x0000000000533500 in std::__future_base::_State_base::wait (this=0x7fbc00afb038) at /usr/include/c++/4.8.2/future:327
#4 0x00000000005449b9 in std::__basic_future<cpp_redis::reply>::_M_get_result (this=0x7fbc1ddb6770) at /usr/include/c++/4.8.2/future:596
#5 0x000000000053c3de in std::future<cpp_redis::reply>::get (this=0x7fbc1ddb6770) at /usr/include/c++/4.8.2/future:675
#6 0x00000000005347e1 in Adoter::Redis::Get (this=0x7fbc00afacb0, key="user:guest_1179", value=..., async=false) at ../Include/RedisManager.h:184

Cylix · 2017-11-20T01:36:39Z

Hi @LazyPlanet ,

Can you detail the issue you are experiencing?

Best

Cylix added the bug label Jul 2, 2017

Cylix mentioned this issue Jul 2, 2017

Seeing inconsistent test results in TEST(RedisSubscriber, SubscribeMultiplePublished) Cylix/cpp_redis#85

Closed

Cylix closed this as completed in ae0f2e9 Jul 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent async_read behaviour after 2.4.3 #17

Inconsistent async_read behaviour after 2.4.3 #17

willxinc commented Jul 1, 2017

Cylix commented Jul 2, 2017

willxinc commented Jul 2, 2017

Cylix commented Jul 2, 2017

Cylix commented Jul 2, 2017

LazyPlanet commented Nov 19, 2017

Cylix commented Nov 20, 2017

Inconsistent async_read behaviour after 2.4.3 #17

Inconsistent async_read behaviour after 2.4.3 #17

Comments

willxinc commented Jul 1, 2017

Cylix commented Jul 2, 2017

willxinc commented Jul 2, 2017

Cylix commented Jul 2, 2017

Cylix commented Jul 2, 2017

LazyPlanet commented Nov 19, 2017

Cylix commented Nov 20, 2017