-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mqtt.homie] Improve Homie discovery time #7760
Conversation
Signed-off-by: Aitor Iturrioz <[email protected]>
Signed-off-by: Aitor Iturrioz <[email protected]>
Travis tests were successfulHey @bodiroga, |
* | ||
* @param t An object | ||
*/ | ||
@Override | ||
public void accept(T t) { | ||
synchronized public void accept(T t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect this method to be called concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sir, I have seen more than on scheduler been created with a single Homie device and I can't synchronize on the future
field because it is nullable 😉
if (scheduledFuture != null && !scheduledFuture.isDone()) { | ||
scheduledFuture.cancel(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a lot of potential concurrency issues around reassigning the future which mostly has to do with how this interacts with the other methods that cancel the future. I'm going to suggest a couple of changes which will make all of this thread-safe.
First, create a utility method for cancelling the future:
private static void cancel(@Nullable Future<?> future){
Future<?> future = this.future;
if(future != null){
future.cancel(false);
}
}
You will notice I don't bother checking if it is done or not since nothing bad will happen if you try to cancel an already finished future.
Then you will want to change
protected @Nullable ScheduledFuture<?> future;
to
protected final AtomicReference<@Nullable Future<?>> futureRef = new AtomicReference<>()
.
Then when you just want to cancel the future you call cancel(futureRef.getAndSet(null))
And when you want to schedule the future while canceling the old one you call cancel(futureRef.getAndSet(executor.schedule(this::run, delay, TimeUnit.MILLISECONDS)));
A few of the other methods would also change slightly.
/**
* Return true if there is a delayed processing going on.
*/
public boolean isArmed() {
ScheduledFuture<?> scheduledFuture = this.futureRef.get();
return scheduledFuture != null && !scheduledFuture.isDone();
}
Doing all this will allow you to safely remove the synchronized
you added to accept
as well as making all sure all of your other code is interacting with your future in an atomic way. I think using an AtomicReference here is more performant and less error prone than trying to basically synchronize every method in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for the detailed instructions @cpmeister, I will add them 👍 @jochen314, do you have anything to add? Do you agree with the proposed solution? I don't want to touch the DelayedBatchProccesing class without your approval 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 100% with @cpmeister here.
Personally I would even try a method like
private boolean setNewFuture(ScheduledFuture<?> newFuture) {
Future<?> oldFuture = this.future.getAndSet(newFuture);
boolean result = true;
if(future != null){
result = future.isDone();
future.cancel(false);
}
return result;
}
And just one more thing:
There is the possibility that:
- The future times out
- We receive a new call to
accept()
Does your caode then schedule another timeout-future?
I think, this should not be done. That's why my proposed method returns, if the old future was already done, so we could use that, not to trigger a new timeout, once a timeout occured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jochen314!
First of all, many thanks for your message, it's good to see you here 😉 But... I don't understand what you mean with your last concerns.
- If the future times out, that means that no new messages have arrived in X seconds, so the
run
method is called. - A new call to
accept()
just reschedules the future, resetting the process.
To make things more clear, the current DelayedBatchProccesing class behavior is:
- The first time the
accept()
method is called, the future timer is set and new objects are collected whenaccept()
is triggered. After thedelay
time has passed, therun
method is called.
With the proposed changes, the new behavior will be:
- The first time the
accept()
method is called, the future timer is set, but it is rescheduled when a new object is accepted. Therun
method will be called only afterdelay
seconds have passed since the last message.
The new logic ensures that discovery processes that take longer than initially anticipated can be completed successfully, as it was suggested in #5963 (comment). Perhaps I'm explaining something that is obvious, but I just wanted to avoid any misunderstanding.
Anyway, tomorrow I will update the PR with the changes proposed by @cpmeister. Stay tuned!
But then we are missusing the My normal understanding would have been, that the So, what was the problem with the short time out in the first place?
|
Ummmmm... I wouldn't be so "hard" on it, IMO the usage is the same, the "delay" time strategy is what have changed. It is a BatchProcessor that delays the return of the elements X seconds since the last one was added, not since the processor was created.
What do you mean by "every time called"? The DelayedBatchProcessor returns asynchronously, the caller just defines how much time it wants to wait until the results are returned. But with the new implementation, the caller still knows when the result will be returned, as he is the one that adds a new element to the batch.
Ok, you are right here, I suppose that the HomieThingHandler implementation could be changed to allow discovering the devices even when the information is not received at the same time, but I didn't consider it after reading @davidgraeff and @J-N-K's comments in #5963. @cpmeister, you didn't read my previous comment #7760 (comment), did you? 😆 |
@bodiroga happy to test - OpenHabian is failing to install the 2.5.6 testing release. |
Is there a test binary to try? |
In terms of MQTT messages, timing, Homie, I have a little more than 11000 retained messages that get sent over at start up. The Homie convention does not describe how long a controller should wait for all messages to be sent but does specify that the $state->ready is used to signal when the device is ready. Perhaps that should be the only time that OH complains about not receiving required messages in time? |
@mjcumming - But you can't use $state as a signal because there is no specified order for topics to be delivered. Also, the homie device can be "ready" at all times without the controller connected. |
Actually there is ordering of MQTT messages if QOS is the same. See 4.6 |
My understand is that applies to messages within a topic (which would obviously be necessary) but not the order of which to publish topics. |
@boc-tothefuture - I have tried to clarify exactly how message ordering works. Is it the order messages were received by the broker, order for a topic and its sublevels, or order only for an exact topic match? My guess after reading this is that you are correct but the verbiage is still not completely clear... |
Hi guys! Thanks for all your comments, but regarding the message order, I always thought that the broker doesn't guarantee that retained messages are delivered in the same order they were sent. Although, to be honest, I haven't read the protocol specification. About the possibility of doing some tests, would the homie jar be enough? As the PR is merged, what about using a snapshot docker build? Or are snapshot builds broken as stated in #7760 (comment)? Best regards! |
@bodiroga 2.5.6~S131 seems to have solve the problem. 122 Homie devices discovered and added to the in box in about .6s on a PI 4 great work! thank you. |
Fantastic news @mjcumming, many thanks for the feedback! 👍 |
* 2.5.x: (174 commits) [hpprinter] Add additional data points and refactoring (openhab#7805) [neohub] new/legacy API; null annotations; enhancements; bugs; logging (openhab#7767) [meteoalerte] Initial contribution (openhab#7200) [lgwebos] Console command to show the access key (openhab#7801) [hue] Refactored state handling and fix polling after command (openhab#7518) [telegram] add attachment URL (openhab#7816) [siemensrds] readme adjusted to match openhab#7814 (openhab#7819) [lametrictime] correctly parse response (openhab#7818) [Seneye] Bug fix for using Pond or Home sensors. (openhab#7797) [siemensrds] apply UoM quantityType percent for relative humidity (openhab#7814) [alarmdecoder] Add vzone thing for virtual zone control (openhab#7800) [hue] Channel alert added for groups (openhab#7810) [hue] Keep compatibility with hue emulation for groups (openhab#7809) [dscalarm] Bridge/things management refactored (openhab#7748) [avmfritz] Add link to Fensterkontakt (magnetisch) to docs (openhab#7806) [deconz] add light/blind support and additional sensors (openhab#7608) [homekit] add support for min/max values for temperature (openhab#7782) [tesla] Use CXF JAX-RS client builder, if available (openhab#7804) [mqtt.homie] Improve Homie discovery time (openhab#7760) [siemensrds] null annotations; JUnit; UoM; enhancements; bug; refactoring; logging (openhab#7769) ...
Is the updated homie jar available to download for testing? I am not running OpenHab in a container. |
@boc-tothefuture the latest snapshot has it - I installed from Openhabian |
@bodiroga would you have any interest in working developing the binding improvehomie device autodiscovery |
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]> Signed-off-by: CSchlipp <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]> Signed-off-by: Daan Meijer <[email protected]>
* Improve Homie discovery time Signed-off-by: Aitor Iturrioz <[email protected]>
As expressed in #5963 (comment), the best way to improve the Homie discovery time was to decrease the timeout values, while also resetting the timer everytime a new value is received. This are the changes done:
run
function will only be called after the last message is received, and it doesn't matter how long it takes, as long as there is a constant flow of messages.accept
function synchronized, so that no new schedules are created when multiple threads are involved.As the DelayedBatchProccesing class is also used in the Home Assistant extension, I would really like to hear @jochen314's opinion about the proposed changes 👍 A review is also requested for @cpmeister, @J-N-K and @cweitkamp.
Closes #5958
Signed-off-by: Aitor Iturrioz [email protected]