-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c8y mapper not starting up reliably #3279
Comments
From looking at the log, it looks like mosquitto can't establish a bridge between the local broker and the cloud:
Please also share |
But then I should be able to see these published messages on the local broker when subscribing to I've attached the requested files. Below mapper log files is from the same timeframe I've grabbed the mosquitto log. |
So actually I'm able to see traffic on the
and also the JWT is correctly retrieved beforehand:
But nothing further afterwards.. |
After replicating your child devices, I am able to reproduce the issue.
So it is indeed most likely a bug with thin-edge, and not the bridge. Trying to find the cause now. |
A cause was identified: due to an accidental deadlock, the outgoing MQTT message queue wasn't getting drained. |
Because the C8yMapperActor synchronously first receives a message from the input channel, and then produces output by sending to the output channel, if one of these channels block, the other one will also not be processed. In this case, there was a loop: C8yMapperActor sent `MqttMessage`s to AvailabilityActor, and AvailabilityActor send `PublishMessage`s to C8yMapperActor for it to relay it to MqttActor. When a sender for `PublishMessage`s was full when AvailabilityActor tried sending another message, it would block, so until this message was processed, AvailabilityActor wasn't processing now input. However, if before this output was sent by C8yMapperActor there was another message to relay to AvailabilityActor, and the inbound channel was also full, this would result in a deadlock where neither AvailabilityActor input or output could be sent through. Signed-off-by: Marcel Guzik <[email protected]>
Because the C8yMapperActor synchronously first receives a message from the input channel, and then produces output by sending to the output channel, if one of these channels block, the other one will also not be processed. In this case, there was a loop: C8yMapperActor sent `MqttMessage`s to AvailabilityActor, and AvailabilityActor send `PublishMessage`s to C8yMapperActor for it to relay it to MqttActor. When a sender for `PublishMessage`s was full when AvailabilityActor tried sending another message, it would block, so until this message was processed, AvailabilityActor wasn't processing now input. However, if before this output was sent by C8yMapperActor there was another message to relay to AvailabilityActor, and the inbound channel was also full, this would result in a deadlock where neither AvailabilityActor input or output could be sent through. Signed-off-by: Marcel Guzik <[email protected]>
This test is meant to verify the fix to thin-edge#3279. Another test like this will probably have to be added to cover the original behaviour that motivated connecting AvailabilityActor to the C8yMapperActor instead of MqttActor directly, i.e. AvailabilityActor sending a Smartrest 117 message before C8yMapperActor sends 101 message that registers a child device (which is invalid and should not happen). Signed-off-by: Marcel Guzik <[email protected]>
Because the C8yMapperActor synchronously first receives a message from the input channel, and then produces output by sending to the output channel, if one of these channels block, the other one will also not be processed. In this case, there was a loop: C8yMapperActor sent `MqttMessage`s to AvailabilityActor, and AvailabilityActor send `PublishMessage`s to C8yMapperActor for it to relay it to MqttActor. When a sender for `PublishMessage`s was full when AvailabilityActor tried sending another message, it would block, so until this message was processed, AvailabilityActor wasn't processing now input. However, if before this output was sent by C8yMapperActor there was another message to relay to AvailabilityActor, and the inbound channel was also full, this would result in a deadlock where neither AvailabilityActor input or output could be sent through. Signed-off-by: Marcel Guzik <[email protected]>
…actor" This reverts commit 47ddac9.
Because the C8yMapperActor synchronously first receives a message from the input channel, and then produces output by sending to the output channel, if one of these channels block, the other one will also not be processed. In this case, there was a loop: C8yMapperActor sent `MqttMessage`s to AvailabilityActor, and AvailabilityActor send `PublishMessage`s to C8yMapperActor for it to relay it to MqttActor. When a sender for `PublishMessage`s was full when AvailabilityActor tried sending another message, it would block, so until this message was processed, AvailabilityActor wasn't processing now input. However, if before this output was sent by C8yMapperActor there was another message to relay to AvailabilityActor, and the inbound channel was also full, this would result in a deadlock where neither AvailabilityActor input or output could be sent through. Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
Solve the deadlock that can arise when C8yMapperActor can't complete sending MqttMessage to some other actor because that other actor can't complete the send of a `PublishMessage` to C8yMapperActor by creating a fast path to drain PublishMessage buffer ASAP. Previously when `PublishMessage`s were a part of the main message box we could not prioritize them because we had to process messages in order; separating out the receiver for `PublishMessage`s makes it so no other messages can block `PublishMessage`s and allows us to use select! to concurrently process them when processing other events blocks Signed-off-by: Marcel Guzik <[email protected]>
fix #3279: Create a fast path for sending `PublishMessage`s
@reey We've merged a fix for this. We'll include it in a new official release next week, however it might be worth verifying for yourself by install thin-edge.io from our main release channel. You can easily do this by executing the following one-liner: wget -O - thin-edge.io/install.sh | sh -s -- --channel main Or if you are a curl user: curl -fsSL https://thin-edge.io/install.sh | sh -s -- --channel main |
I gave it a try and it seems to be working fine 👍 |
Describe the bug
When ever I restart my machine or restart the c8y mapper service (
sudo systemctl restart tedge-mapper-c8y.service
) it is not really reliably starting up again. The mqtt messages on thete/#
topics are not processed so that no messages are actually published to thec8y/#
topic.I've kept the mapper in this not functioning state for ~8 hours to verify if it comes back online at some point, but it does not.
About 1 out of 5 startup attempts of the service is succeeding for me. I had the feeling that with debug logging enabled for the mapper it was starting up a bit more reliable.. Maybe some race condition? I've also got 82 child devices connected right now. I'm not sure what the mapper might be doing on startup, but maybe the number of devices is triggering a race condition?
To Reproduce
sudo systemctl restart tedge-mapper-c8y.service
c8y/#
Expected behavior
Restarting the
tedge-mapper-c8y
service should not cause any issues and the mapping functionality should proceed afterwards.Screenshots
Environment (please complete the following information):
Debian GNU/Linux 12 (bookworm)
Raspberry Pi 4 Model B Rev 1.2
Linux raspberrypi 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
tedge 1.3.1
2.0.11
Additional context
I've got about 82 device child devices connected to my thin-edge instance.
I've attached an encrypted debug log file: thin-edge-c8y-mapper.age.log
The text was updated successfully, but these errors were encountered: