Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing/mishandled heartbeats after a few messages #344

Open
sobjerke opened this issue Jan 21, 2021 · 1 comment
Open

Missing/mishandled heartbeats after a few messages #344

sobjerke opened this issue Jan 21, 2021 · 1 comment

Comments

@sobjerke
Copy link

consumer.zip

Server: ActiveMQ (5.16.0) with stomp+nio+ssl transport connector
Producer: stomp.py 6.1.0 sending timestamped message every 10 seconds
Consumer: stomp.py (Tested both 6.1.0 and Dev)

Typical output:
2021-01-21 14:02:04: received message: "test message at 2021-01-21 14:02:04"
2021-01-21 14:02:10: received heartbeat
2021-01-21 14:02:14: received heartbeat
2021-01-21 14:02:15: received message: "test message at 2021-01-21 14:02:14"
Heartbeat timeout: diff_receive=6.000141425989568, time=5618406.807629671, lastrec=5618400.807488245
[Errno 0] Error
2021-01-21 14:02:21: stomp disconnected, try to reconnect
2021-01-21 14:02:21: receiver loop ended
2021-01-21 14:02:21: heartbeat timeout
2021-01-21 14:02:25: received message: "test message at 2021-01-21 14:02:25"
2021-01-21 14:02:29: received heartbeat
2021-01-21 14:02:33: received heartbeat
2021-01-21 14:02:43: receiver loop ended
2021-01-21 14:02:43: stomp disconnected, try to reconnect
2021-01-21 14:02:45: received message: "test message at 2021-01-21 14:02:45"
2021-01-21 14:02:51: received heartbeat
2021-01-21 14:02:55: received message: "test message at 2021-01-21 14:02:55"
2021-01-21 14:03:09: receiver loop ended
2021-01-21 14:03:09: stomp disconnected, try to reconnect

Observations:

  • until first message is received, heartbeats are received and handled as expected
  • after a few messages (typically 1-3), a heartbeat gets missed and on_heartbeat_timeout() is called
  • error message "Heartbeat timeout: diff_receive..." is printed only first time a heartbeat is missed
  • if a new message is received within the heartbeat interval, the connection in stable and there are no errors
  • after the first reconnect, heartbeats are missed (on_heartbeat() not called) and without a call to on_heartbeat_timout()
  • on_receiver_loop_completed() is called when a heartbeat is missed
  • tcpdump on client shows incoming traffic from MQ server at least every heartbeat interval.
@sobjerke
Copy link
Author

To eliminate an incorrectly configured ActiveMQ server, I replaced the configuration with the one supplied in the downloaded tar-file and changed the python code to not use SSL. When analyzing the tcp traffic, I was able to see that the first packet after a message, was not a heartbeat as I believed when I looked at the SSL traffic.

The first heartbeat from ActiveMQ after a message can take up to twice the heartbeat interval to arrive. It seems there is a bug in ActiveMQ that was reported in 2013 and still not resolved even if several patches has been proposed: https://issues.apache.org/jira/browse/AMQ-4710

I worked around the ActiveMQ issue by setting heart_beat_receive_scale=3 in stomp.Connection.

However it still seems there is an issue with stomp.py. After a reconnect, the consumer's receiver loop ends every 2 to 3 heartbeats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant