-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After mqtt client is reconnected, it is unable to continue publishing message #1727
Comments
Can you provide a script that reproduces the issue? |
Hey @robertsLando , we are having same issue in our react application in the fronend this time. i think now reactjs and nodejs both has the same issue. |
@jaketakula I need to see more details about the error and how to reproduce it. Also what version are you using? |
|
you mean 1/2 weeks with the browser opened or what? |
The browser never sends data to broker but keeps open and only receive msg
from broker.
…On Wed, 10 Jan 2024 at 7:02 pm, Daniel Lando ***@***.***> wrote:
this issue happens randomly in chrome broswer - roughly every1 or 2 weeks
seen once.
you mean 1/2 weeks with the browser opened or what?
—
Reply to this email directly, view it on GitHub
<#1727 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCFQK2J5BDWLWTKXWGOTA33YNZDIDAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM2TQOJTG4>
.
You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT
.***@***.***>
|
Could be fixed by #1779 , someone could give a try to 5.3.5? |
Thank you very much.
I will bump the version now.
Will update you later on.
…On Wed, 24 Jan 2024 at 1:31 am, Daniel Lando ***@***.***> wrote:
Could be fixed by #1779 <#1779> ,
someone could give a try to 5.3.5?
—
Reply to this email directly, view it on GitHub
<#1727 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCFQK2MTWWRAHR2OIE2T65TYP7CVXAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGE3TONZYGQ>
.
You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT
.***@***.***>
|
@jaketakula Thanks! Any news? |
It usually take 1 or 2 weeks to see the issue.
So pls keep patient. Thx.
…On Wed, 24 Jan 2024 at 6:26 pm, Daniel Lando ***@***.***> wrote:
@jaketakula <https://github.com/jaketakula> Thanks! Any news?
—
Reply to this email directly, view it on GitHub
<#1727 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCFQK2OEEP6NCUB5RICMHEDYQCZTTAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGUZTOMJZGM>
.
You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT
.***@***.***>
|
any news on this? we have a similar issue so far, we did upgrade to the latest version but want to make sure before we deploy (iot devices) it won't happen again. |
I didn't checked my own as I never faced this issue, dunno if @jaketakula has news (but I think he would have write if a bug happended). Recent changes fixed a very old bug in reconnect/keep alive that could have caused it BTW |
we deployed 10 IoT devices and slowly they're getting disconnected one by one without reconnecting. So the issue still persists. Here's the config I'm using: this.#conn = mqtt.connect(url, {
cert: CERT_PATH,
key: KEY_PATH,
ca: CA_PATH,
protocolId: "MQTT",
protocolVersion: 5,
encoding: "binary",
clean: false,
clientId: 'specific-IoT-id',
keepalive: 60,
reconnectPeriod: 1000,
connectTimeout: 30000,
reschedulePings: false,
queueQoSZero: true,
resubscribe: true,
manualConnect: false,
}); I also tried to add timeout on publish (since when sending QoS 1 messages, it waits until delivery happens) but fails miserably. pseudo code:
but after I'm using EDIT: can you suggest what shall I do? workaround will also do cause we're in rush right now.
Yup, same thing. |
@overflowz Please open a new bug issue and follow the steps in order to also attach DEBUG logs |
The problem is, it's hard to reproduce and you gotta wait for days or even weeks for it to trigger (this is for the reconnect). As for the reconnect() -- will do later today. |
I understand that but it's hard for me to know what's going on here without more info,,, what you could do is to also patch the log function in client and print logs to a file so you don't loose them when this happens |
I'm not sure if it's related but I'm facing a similar problem. Context
what happens is that every X hours the 5G modem goes down for a short period of time. When this happens I can see in the manager's logs that when it tries to communicate via HTTP it receives a The problem is that when the connection is back the My sender function looks something like this: async function sendToBroker(topic, message) {
if (!this.client.connected) {
console.warn("Client is not connected, storing message");
this.storage.store({ topic, message });
return;
}
console.debug("Sending message to broker", message);
await this.client.publishAsync(topic, message);
} that is called like this: async function send(ctx) {
ctx.call("mqtt.sendToBroker", {
topic: "topic",
message: "message",
},
{
timeout: 10000, // Throw an error if it takes more than 10 seconds
});
} In the logs, I can see Here's how I create my client: this.client = mqtt.connect(url, {
username: token,
cert: fs.readFileSync(certPath, "utf8"),
rejectUnauthorized: false,
}); Question Might be vaguely related to: #1825 |
@AndreMaz Could you create a full script that I can use to reproduce the issue? By checking the other issues it seems this happens only when working with tls? |
@robertsLando It will be difficult as it's a proprietary code but I'll try to create a repro example
I've been using TLS since the beginning but can't say for sure if it's the source of the problem |
Can confirm, we're also using with tls, haven't tried otherwise. |
I don't need your source code but a scripts that reproduces the issue. An easy one that connects to a broker with TLS (you can use hivemq public one https://www.hivemq.com/mqtt/public-mqtt-broker/) and then try to reproduce the disconnect and see if the problem happens. I tried last time without success and if I cannot reproduce it on my side it's hard to fix |
Yep yep, I know that you don't want it. I've created a simple script that mimics the data-flow but I can't reproduce the issue at home. I've been switching my laptop between WiFi, Ethernet, and 5G hotspot and so far no luck, ie, I did not see the timeout that I've mentioned. |
if it helps, since we updated to version 5.3.5, we're facing this issue less frequently, but it's still there. I believe these were the the relevant chages that could've affected it #1779 but in the latest release, there are fixes for the possible race condition again: #1848 but it's hard for us to keep updating many iot devices since we have to do a npm install where connection is very unstable. these issues are really hard for us to test in a production environment and it also costs us a lot. is there any version that is considered stable so we can pick it for now until these issues gets fixed? thank you! |
I also think that this is a regression that was introduced in the latest releases (don't know which one tho) I'll have to check my previous releases and test them out. Will keep you guys updated |
I'm sorry for the issues guys and I would like to help you but it's hard to guess what could be the root cause here, we should firstly try to find out an easy way to reproduce the issue somehow |
No worries at all! We're devs too and we understand the frustration :-) Just to be clear, I didn't mean it as a "fix it asap", I do appreciate the work the maintainers are doing, truly. I was asking if there are any old version(s) I could try so I won't get pressured by the company to fix a problem that is hard to explain why it does not work sometimes xD Regardless, I really do wish to help somehow too, but it's really hard to reproduce :( We're currently running the code on 50 devices and it might happen once a week, two weeks or even months per one device, it's really unpredictable and random. |
I'm the only maintainer here unfortunately and I started because I use this package in almost all my projects and I wanted to help keeping it maintained (as it was almost died) Based on the first comment of this issue seems this was happening also with older versions so I dunno, I'm sorry :( |
Hey @robertsLando no need to apologize. Huge kudos to you for what you're doing 💪
Just checked, I went from I'm rolling back to |
@AndreMaz Thanks! |
if it helps, we were using 5.3.3 previously and the issue was appearing more frequently than now (about 3-4 times a week). |
Is there a way for you guys to override the log function in client constructor and store logs somewhere? It would be very helpful to see the difference in logs between a normal reconnect and one that causes the stop of publishing |
Yep, I can. I'm going to try to collect the logs |
Better if using latest version |
Yep yep. No problem |
sadly not for me :( we're using remote iot devices on 3g, which also are limited on network bandwidth. is there a way to turn on logging only for connect/reconnects (or anything related to connections only)? |
Nope unfortunately |
Hey @robertsLando Is this enough info to locate the issue? [2024-04-29T11:55:51.052Z] DEBUG MQTT-CLIENT: publish :: message `%s` to topic `%s` {"ts":1714391751051,"values":{"cpuCores":8}} v3/telemetry
[2024-04-29T11:55:51.052Z] DEBUG MQTT-CLIENT: publish :: qos 0
[2024-04-29T11:55:51.052Z] DEBUG MQTT-CLIENT: MqttClient:publish: packet cmd: %s publish
[2024-04-29T11:55:51.052Z] DEBUG MQTT-CLIENT: _sendPacket :: (%s) :: start mqttjs_1d7221de
[2024-04-29T11:55:51.052Z] DEBUG MQTT-CLIENT: _writePacket :: packet: %O { cmd: 'publish', topic: 'v3/telemetry', payload: '{"ts":1714391751051,"values":{"cpuCores":8}}', qos: 0, retain: false, messageId: 0, dup: false }
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: _writePacket :: emitting `packetsend`
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: _writePacket :: writing to stream
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: _writePacket :: writeToStream result %s false
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: _writePacket :: handle events on `drain` once through callback.
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: _sendPacket :: (%s) :: end mqttjs_1d7221de
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: publish :: message `%s` to topic `%s` {"cpuCores":8} v3/attributes
[2024-04-29T11:55:51.053Z] DEBUG MQTT-CLIENT: publish :: qos 0
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: MqttClient:publish: packet cmd: %s publish
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _sendPacket :: (%s) :: start mqttjs_1d7221de
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _writePacket :: packet: %O { cmd: 'publish', topic: 'v3/attributes', payload: '{"cpuCores":8}', qos: 0, retain: false, messageId: 0, dup: false }
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _writePacket :: emitting `packetsend`
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _writePacket :: writing to stream
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _writePacket :: writeToStream result %s false
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _writePacket :: handle events on `drain` once through callback.
[2024-04-29T11:55:51.054Z] DEBUG MQTT-CLIENT: _sendPacket :: (%s) :: end mqttjs_1d7221de
[2024-04-29T11:55:51.285Z] WARN BROKER: Request 'mqtt.telemetry' timed out. { requestID: 'd0c3564e-d865-44f5-ad6e-efac7eecfd66', nodeID: 'misty-violet-6717', timeout: 10000 }
[2024-04-29T11:55:51.286Z] WARN BROKER: Request 'mqtt.sendAttributes' timed out. { requestID: '25d2dd35-f844-4399-b87a-48d02e09e9de', nodeID: 'misty-violet-6717', timeout: 10000 } The last 2 lines mean that my async function telemetry(topic, message) {
if (!this.client.connected) {
this.storage.store({ topic, message });
return;
}
await this.client.publishAsync(topic, message);
} did not resolve in 10sec |
@robertsLando this is interesting. The logs that I've shared previously were from 11:55:51 but the re-connection logic only appears at 12:01:22 [2024-04-29T12:01:22.745Z] DEBUG MQTT-CLIENT: streamErrorHandler :: error read ENETUNREACH
[2024-04-29T12:01:22.745Z] DEBUG MQTT-CLIENT: streamErrorHandler :: emitting error
[2024-04-29T12:01:22.746Z] ERROR MQTT-SERVICE: [mqtt] Client error. Error: read ENETUNREACH
at TLSWrap.onStreamRead (node:internal/stream_base_commons:217:20)
at TLSWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
errno: -101,
code: 'ENETUNREACH',
syscall: 'read'
}
[2024-04-29T12:01:22.748Z] DEBUG MQTT-CLIENT: (%s)stream :: on close mqttjs_1d7221de
[2024-04-29T12:01:22.749Z] DEBUG MQTT-CLIENT: _flushVolatile :: deleting volatile messages from the queue and setting their callbacks as error function
[2024-04-29T12:01:22.749Z] DEBUG MQTT-CLIENT: stream: emit close to MqttClient
[2024-04-29T12:01:22.749Z] DEBUG MQTT-CLIENT: close :: connected set to `false`
[2024-04-29T12:01:22.749Z] DEBUG MQTT-CLIENT: close :: clearing connackTimer
[2024-04-29T12:01:22.749Z] DEBUG MQTT-CLIENT: close :: destroy ping timer
[2024-04-29T12:01:22.750Z] DEBUG MQTT-CLIENT: close :: calling _setupReconnect
[2024-04-29T12:01:22.750Z] DEBUG MQTT-CLIENT: _setupReconnect :: emit `offline` state
[2024-04-29T12:01:22.750Z] INFO MQTT-SERVICE: MQTT client is offline.
[2024-04-29T12:01:22.750Z] DEBUG MQTT-CLIENT: _setupReconnect :: set `reconnecting` to `true`
[2024-04-29T12:01:22.750Z] DEBUG MQTT-CLIENT: _setupReconnect :: setting reconnectTimer for %d ms 1000
[2024-04-29T12:01:22.750Z] INFO MQTT-SERVICE: Disconnected from MQTT endpoint.
[2024-04-29T12:01:23.377Z] WARN BROKER: Request 'mqtt.sendAttributes' is timed out. { requestID: '852d8692-9108-4ff1-b417-1ff969f0f43c', nodeID: ', timeout: 10000 } and then again at 12:16:01 [2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: _flushVolatile :: deleting volatile messages from the queue and setting their callbacks as error function
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: stream: emit close to MqttClient
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: close :: connected set to `false`
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: close :: clearing connackTimer
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: close :: destroy ping timer
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: close :: calling _setupReconnect
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: _setupReconnect :: emit `offline` state
[2024-04-29T12:16:01.374Z] INFO THINGSBOARD: MQTT client is offline.
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: _setupReconnect :: set `reconnecting` to `true`
[2024-04-29T12:16:01.374Z] DEBUG MQTT-CLIENT: _setupReconnect :: setting reconnectTimer for %d ms 1000
[2024-04-29T12:16:01.374Z] INFO THINGSBOARD: Disconnected from MQTT endpoint.
[2024-04-29T12:16:02.375Z] DEBUG MQTT-CLIENT: reconnectTimer :: reconnect triggered!
[2024-04-29T12:16:02.376Z] DEBUG MQTT-CLIENT: _reconnect: emitting reconnect to client
[2024-04-29T12:16:02.376Z] INFO THINGSBOARD: Reconnecting to MQTT endpoint...
[2024-04-29T12:16:02.376Z] DEBUG MQTT-CLIENT: _reconnect: calling connect
[2024-04-29T12:16:02.378Z] DEBUG MQTT-CLIENT: connect :: calling method to clear reconnect
[2024-04-29T12:16:02.379Z] DEBUG MQTT-CLIENT: _clearReconnect : clearing reconnect timer
[2024-04-29T12:16:02.379Z] DEBUG MQTT-CLIENT: connect :: using streamBuilder provided to client to create stream
[2024-04-29T12:16:02.381Z] DEBUG MQTT-CLIENT: connect :: pipe stream to writable stream
[2024-04-29T12:16:02.381Z] DEBUG MQTT-CLIENT: connect: sending packet `connect` |
It could be that the connection was already broken but not closed yet so the write to stream happened but you didn't received any response back. Are you using a MQTT packet store? What is your keepalive in options? |
No
Everything is default Here's how I create my client: this.client = mqtt.connect(url, {
username: token,
cert: fs.readFileSync(certPath, "utf8"),
rejectUnauthorized: false,
}); |
@AndreMaz I don't see anything strange on your logs. The logic is correct:
Then it would be interesting to see what happens next when packets are not sent anymore |
But then why at 11:55 it tried to send data, which means that |
I think because the connection was not broken yet? What I don't understand is that you have a keepalive of 60s but the disconnect happens much later 🤔 I think I just have discovered a possible bug here as I'm shifting the keepalive check every time we send a packet but we should do it every time we successfully write it to stream instead. Dunno if this could solve the issue but it's worth a try |
Ok yes I think to have found the root cause of the problem guys 🎉 Could someone give a try to the PR #1855? |
If that's gonna fix it, I'm gonna cri |
Waiting for your feedback so! It works on my side and it also makes sense, I wasn't able to reproduce it on my side because I was publishing a message every time I received one, in your case you keep publish independently and when I did this I noticed the keepalive timeout was never happening and that's the error here, we was shifting the keepalive check on every message sent (so even on publish) causing it to never trigger even if the connection was broken |
@robertsLando thank you! Just deployed. Will leave it running for a couple of days. I'll let you know the result |
I just pushed a little change to avoid possible race conditions, please redeploy with latest commit. Now I will write some tests to better cover this things |
@robertsLando yep yep. Re-deploying it now |
FYI it's been 2 days and so far so good |
@AndreMaz Thanks for the feedback! I also made a release in the meanwhile so you can use that once you end the tests :) How often did it used to happen before? |
@robertsLando sorry for delay, I had to handle some other stuff. Checked the logs and everything seems ok. However, I'm seeing the new v5.5.6 release and a keepalive PR which means that there are still some edge cases. Should I stick with 5.5.5 or bump to the 5.5.6? |
@AndreMaz The issue there is that the fix to this introduced a bug that happens when the client only receives packets (subscribes), in that case the keepalive ping is always skipped as in order to fix this I moved the ping reschedule on every packet received from server so the ping was never sent in that case causing the broker to force close the connection. 5.5.6 fixes this issue and #1865 will be hopefully the last keepalive related PR I will do, that refactors keepalive a lot in order to make it more respectful of specs :) Anyway 5.5.6 is good as it will work for all cases, 5.6.0 will be the next release once #1865 lands |
Super! Thank you for your hard work |
Of course stronger tests have been added in order to cover such cases in future 🤞🏼 |
Hi there,
I am using the latest version of mqttjs (5.1.3) and this issue happens in old versions as well.
In my app, I received the offline event even when the mosquitto MQTT broker is up and running. After the mqttjs client goes through the process of offline -> closed -> reconnect -> connected. The mqttjs client can no longer publish messages again.
Is there a workaround?
app log:
mqtt broker log:
The text was updated successfully, but these errors were encountered: