Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The connection to a device in Eclipse Hono Sandbox is lost if a cloud application fails to process a message #3393

Closed
ognian-baruh opened this issue Aug 25, 2022 · 6 comments

Comments

@ognian-baruh
Copy link

I have created a tenant and a device and am explicitly setting the messaging type of the tenant to AMQP using the following creadentials:

export HONO_EP=hono.eclipseprojects.io
export TENANT=example_tenant
export DEVICE_ID=device_1
export AUTH_ID=auth
export PWD=secret

The tenant and device are being provisioned using the following three commands:

curl -i -X POST http://$HONO_EP:28080/v1/tenants/$TENANT -H  "content-type: application/json"  --data-binary '{"ext": {"messaging-type": "amqp"}}'
curl -i -X POST http://$HONO_EP:28080/v1/devices/$TENANT/$DEVICE_ID -H  "content-type: application/json" --data-binary '{"authorities":["auto-provisioning-enabled"]}'
curl -i -X PUT -H "content-type: application/json" --data-binary '[{
  "type": "hashed-password",
  "auth-id": "'$AUTH_ID'",
  "secrets": [{
      "pwd-plain": "'$PWD'"
  }]
}]' http://$HONO_EP:28080/v1/credentials/$TENANT/$DEVICE_ID

I have created a simple Python script which creates a Command Response handler using the Qpid Proton library:

import json
import time
import os
import signal
import threading

from proton.handlers import MessagingHandler
from proton.reactor import Container


class CommandResponsesHandler(MessagingHandler):
    def __init__(self, server, address):
        super(CommandResponsesHandler, self).__init__()
        self.server = server
        self.address = address

    def on_start(self, event):
        conn = event.container.connect(self.server, user="consumer@HONO", password="verysecret")
        event.container.create_receiver(conn, self.address)
        print('[connected]')

    def on_message(self, event):
        print('[got message]')
        response = json.loads(event.message.body)
        print(json.dumps(response, indent=2))
        # In this state the device does not get disconnected
        # Uncomment any of the lines below to achieve the described disconnect behavior
        # os.kill(os.getpid(), signal.SIGINT)
        # time.sleep(1) # Gets into an infinite disconnect/receive cycle
        # raise Exception("Exception")



# Tenant and device info
tenant_id = "example_tenant"
device_id = "device_1"

# AMQP global configurations
uri = 'amqp://hono.eclipseprojects.io:15672'
reply_to_address = 'command_response/{}/replies'.format(tenant_id)

response_handler = Container(CommandResponsesHandler(uri, reply_to_address))
thread = threading.Thread(target=lambda: response_handler.run(), daemon=True)
thread.start()


def handler(signum, frame):
    response_handler.stop()
    thread.join(timeout=5)
    exit(0)


signal.signal(signal.SIGINT, handler)
while True:
    pass

My aim is to exit the application as soon as a message is received. I am using MQTT Explorer to publish the response to the connected device. The message details are the following:
Topic: command///res/024cc17e6bb-86d7-40a0-b1a8-8f741d750a01replies/200
Payload (JSON formatted): {"example": "value"}
QoS: 1

When I publish the message, it gets received by the Python script and the script gets terminated as expected, but the MQTT Explorer gets disconnected from the device and it cannot connect again through its reconnect functionality. I am only able to reconnect if I disconnect manually beforehand.

Is it expected for the Eclipse Hono Sandbox to terminate the connection to the device when a cloud application has failed to process a message?
If yes, then why does the Eclipse Hono Sandbox accept the connection from the device afterwards?

@sophokles73
Copy link
Contributor

My aim is to exit the application as soon as a message is received. I am using MQTT Explorer to publish the response to the connected device. The message details are the following:
Topic: command///res/024cc17e6bb-86d7-40a0-b1a8-8f741d750a01replies/200
Payload (JSON formatted): {"example": "value"}
QoS: 1

I am not sure if I understand correctly what you are trying to do. Usually, a back end application sends a command to a device and a device sends its response via the MQTT adapter to the back end application.
My understanding of your code is that you are not sending a command at all but simply want to simulate a device sending its command response to the back end application. You are using MQTT explorer to simulate the device, connecting to the Hono Sandbox's MQTT adapter. Is that correct?

Based on these assumptions, I would suspect that the AMQP message received by your Python script is not properly settling the message with the accepted outcome before it terminates. Since you are using QoS 1 when publishing the command response to the MQTT adapter, the adapter will try to forward the message to the downstream app (your script) before sending the PUBACK to the device. If the adapter runs into any problems when processing a message, its default behavior (according to the MQTT 3.1.1 spec) is to close the connection to the client (your device). However, you can influence this behavior as described in the MQTT adapter user guide.

You might want to try to use QoS 0 instead of QoS 1 to publish the command response. In that case, the MQTT adapter simpy uses fire and forget semantics when sending the AMQP message to the downstream application.

Another op

@sophokles73
Copy link
Contributor

@ognian-baruh any news on this? Can we close this issue?

@ognian-baruh
Copy link
Author

Hello @sophokles73, sorry for the late reply, but I have been away for the last week.

My understanding of your code is that you are not sending a command at all but simply want to simulate a device sending its command response to the back end application. You are using MQTT explorer to simulate the device, connecting to the Hono Sandbox's MQTT adapter. Is that correct?

Yes, that is correct. The idea behind the simulation of the command response is to isolate it in order to be sure that this is exactly what is causing the issue.

Based on these assumptions, I would suspect that the AMQP message received by your Python script is not properly settling the message with the accepted outcome before it terminates. Since you are using QoS 1 when publishing the command response to the MQTT adapter, the adapter will try to forward the message to the downstream app (your script) before sending the PUBACK to the device. If the adapter runs into any problems when processing a message, its default behavior (according to the MQTT 3.1.1 spec) is to close the connection to the client (your device). However, you can influence this behavior as described in the MQTT adapter user guide.

After further tests, I came to the same conclusion and modified the script a little bit - when receiving the response I manually close the connection and I then kill the process in the on_connection_closed callback method, which works as expected - the application gets terminated but the connection to the client remains intact.

You might want to try to use QoS 0 instead of QoS 1 to publish the command response. In that case, the MQTT adapter simpy uses fire and forget semantics when sending the AMQP message to the downstream application.

I tried out QoS 0 when publishing the command response and the client get disconnected again, but it is able to reestablish the connection right away.

Also, after some observations I found out that this behavior only occurs when publishing command responses, but not when publishing events. Without any modifications to the script but subscribing to the events topic, I am able to receive the event and kill the process from within the on_message callback without losing the connection to the adapter. What could be the reason behind the different behaviors for the different topics?

@sophokles73
Copy link
Contributor

What could be the reason behind the different behaviors for the different topics?

That is because the command response message are being routed to the downstream consumer whereas the events are being brokered. The former means that the messages are being forwarded by the Dispatch Router to the consumer directly. The latter means that the Dispatch Router forwards the event message to an Artemis message broker which persists the message to a file. From the sender's point of view, the event has then be accepted by Hono (but not yet delivered to the consumer). The downstream consumer will then receive the event message from the Artemis broker, either immediately (if it is connected) or at any later time when it connects ...

@ognian-baruh
Copy link
Author

Thank you very much for the response, I think we can close the issue now, as it seems like everything is working as expected.

@sophokles73
Copy link
Contributor

Great, then please do so :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants