-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Segmentation fault when sending messages after receiving an error #325
Comments
Here are the LLDB debug result:
The underlying implementation of
465 template <typename AsyncWriteStream>
466 class initiate_async_write
467 {
468 public:
481 template <typename WriteHandler, typename ConstBufferSequence,
482 typename CompletionCondition>
483 void operator()(BOOST_ASIO_MOVE_ARG(WriteHandler) handler,
484 const ConstBufferSequence& buffers,
485 BOOST_ASIO_MOVE_ARG(CompletionCondition) completion_cond) const
486 {
/* ... */
493 start_write_op(stream_, buffers, // [2], stream_ is the socket
494 boost::asio::buffer_sequence_begin(buffers),
495 completion_cond2.value, handler2.value);
496 } 542 void (boost::system::error_code, std::size_t))
543 async_write(AsyncWriteStream& s, const ConstBufferSequence& buffers,
/* ... */
555 {
556 return async_initiate<WriteToken,
557 void (boost::system::error_code, std::size_t)>(
558 detail::initiate_async_write<AsyncWriteStream>(s), // [1]
559 token, buffers,
560 BOOST_ASIO_MOVE_CAST(CompletionCondition)(completion_condition));
561 }
// frame 9, boost/asio/impl/write.hpp (Boost 1.82.0)
342 {
343 {
344 BOOST_ASIO_HANDLER_LOCATION((__FILE__, __LINE__, "async_write"));
-> 345 stream_.async_write_some(buffers_.prepare(max_size),
346 BOOST_ASIO_MOVE_CAST(write_op)(*this));
347 }
348 return; default: As we can see, the application crashed at // boost/asio/impl/write.hpp
355 if (this->cancelled() != cancellation_type::none)
356 {
357 ec = error::operation_aborted;
358 break;
359 } In However, the |
Fixes apache#325 ### Motivation apache#317 introduces a bug that might cause segmentation fault when sending messages after receiving an error, see apache#325 (comment) for the detailed explanation. ### Modifications When calling `asyncWrite`, capture the `shared_ptr` instead of the `weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field in `ClientConnection`. Since the lifetime is extended, in some callbacks, check `isClosed()` before other logic. Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0. Run the test for 10 times to ensure it won't crash after this patch.
…apache#326) Fixes apache#325 ### Motivation apache#317 introduces a bug that might cause segmentation fault when sending messages after receiving an error, see apache#325 (comment) for the detailed explanation. ### Modifications When calling `asyncWrite`, capture the `shared_ptr` instead of the `weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field in `ClientConnection`. Since the lifetime is extended, in some callbacks, check `isClosed()` before other logic. Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0. Run the test for 10 times to ensure it won't crash after this patch.
…#326) Fixes #325 ### Motivation #317 introduces a bug that might cause segmentation fault when sending messages after receiving an error, see #325 (comment) for the detailed explanation. ### Modifications When calling `asyncWrite`, capture the `shared_ptr` instead of the `weak_ptr` to extend the lifetime of the `socket_` or `tlsSocket_` field in `ClientConnection`. Since the lifetime is extended, in some callbacks, check `isClosed()` before other logic. Add a `ChunkDedupTest` to reproduce this issue based on Pulsar 3.1.0. Run the test for 10 times to ensure it won't crash after this patch.
Search before asking
Version
Additional broker configs:
Minimal reproduce step
To simulate the send error is returned, we need to run the following code on a Pulsar release that does not include apache/pulsar#20948
What did you expect to see?
The application exits normally.
What did you see instead?
You can also see the failure reproduced in this workflow:
Anything else?
The root cause might be #317. I tried reverting that commit in my local env and it never crashed.
Here is also a similar crash report: #324
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: