-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in streaming_pull_manager and ResumableBidiRpc after receiving 504 deadline exceeded error #74
Comments
Btw this issue can be resolved by downgrading google-api-core to 1.16.0. |
@sweatybridge I am just trying out to see if I can reproduce it. Interesting fact about the API core workaround, as the latest version (1.17.0) has been released just yesterday - it might be a regression in it. |
Hi, we had a similar problem yesterday in our production env. But I don't know if it is fully related. Environment details
Steps to reproduce
We missed something like
We solved it fixing grpcio version to 1.27.2 |
I can confirm the reported behavior using After switching the WiFi back on, the only thing that's left in the DEBUG logs is the leaser thread activity:
On the other hand, using P.S.: I used the following log config for somewhat better readability (IMHO): log_format = (
"%(levelname)-8s [%(asctime)s] %(threadName)-33s "
"[%(name)s] [%(filename)s:%(lineno)d][%(funcName)s] %(message)s"
)
logging.basicConfig(level=logging.DEBUG, format=log_format) |
@artefactop I tried reproducing the issue @sweatybridge originally reported with If you actually solved your problem just by pinning a different |
@plamut yes, it can be reproduced every time a subscriber starts, It happens for 12 hours until we realize the problem and we pin the version of I just found more logs related to this problem:
Here the subscriber stops getting messages. |
I'm unclear on the status of this bug. It's been marked as closed by pinning a dependency, but: (1) Does that fix this for clients in general? What happens for clients that have something else pulling in a fresher version of api-core? |
@zunger-humu Yes, the fix has been released a few days ago (version 1.4.3). If some other library could pull a more recent version of api-core, I cannot say for certain how other client libraries will behave with api-core 1.17.0, though, as I am primarily familiar with BigQuery and PubSub. You mention that you have a breakage in production - is that even after trying to install the newest PubSub client? |
OK, I am apparently dealing with gnomes; if I build an image with pubsub
1.4.3, it pulls in api-core 1.17.0, even though I just checked both github
and the raw pip image and verified that the fixes are in both places. So
whatever is happening is presumably elsewhere. Thanks!
…On Mon, Apr 20, 2020 at 3:52 PM Peter Lamut ***@***.***> wrote:
@zunger-humu <https://github.com/zunger-humu> Yes, the fix has been
released <https://github.com/googleapis/python-pubsub/releases/tag/v1.4.3>
a few days ago (version 1.4.3).
If some other library could pull a more recent version of api-core, pip
will still install 1.16.0 (as the pubsub client now imposes such
restriction), unless that other library explicitly pins the api-core
version to 1.17.0. In that case, a version conflict will occur and one of
the libraries will not be installed.
I cannot say for certain how other client libraries will behave with
api-core 1.17.0, though, as I am primarily familiar with BigQuery and
PubSub.
You mention that you have a breakage in production - is that even after
trying to install the newest PubSub client?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#74 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHNFBF5AVNEBVICYOTTXMHDRNTG3PANCNFSM4MJJ2YNA>
.
--
Yonatan Zunger
Distinguished Engineer and Chief Ethics Officer
He / Him
[email protected]
100 View St, Suite 101
Mountain View, CA 94041
Humu.com <https://www.humu.com> · LinkedIn
<https://www.linkedin.com/company/humuhq> · Twitter
<https://twitter.com/humuinc>
|
@zunger-humu That's indeed weird, hope you get to the bottom of it. Just in case I tried installing PubSub 1.4.2 and (used |
Environment details
google-cloud-pubsub
version: 1.4.2Steps to reproduce
Here are the debug logs just after 10 mins.
Looking at the logs, I'm expected to see
_LOGGER.debug("Call to retryable %r caused %s.", method, exc)
since it should be printed on error https://github.com/googleapis/python-api-core/blob/5e5559202891f7e5b6c22c2cbc549e1ec26eb857/google/api_core/bidi.py#L508. However, the message was not printed so the thread was stuck acquiringself._operational_lock
.I've sampled the top functions running on all threads after getting into the deadlock state (attached as stacktrace below). The heartbeater's background thread seems to be frequently holding the lock as it performs the
is_active
check https://github.com/googleapis/python-pubsub/blob/master/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py#L425. I'm not sure why it's not being released.Code example
Stack trace
The text was updated successfully, but these errors were encountered: