Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mqtt.homie] Improve Homie discovery time #7760

Merged
merged 4 commits into from
May 26, 2020

Conversation

bodiroga
Copy link

As expressed in #5963 (comment), the best way to improve the Homie discovery time was to decrease the timeout values, while also resetting the timer everytime a new value is received. This are the changes done:

  • Reschedule the batch processing service when a new value is added. This makes the service more reliable, because the run function will only be called after the last message is received, and it doesn't matter how long it takes, as long as there is a constant flow of messages.
  • Make the accept function synchronized, so that no new schedules are created when multiple threads are involved.
  • Decrease Homie device timeout values to smaller ones. I am 'ok' if you want to increase them, but in my tests they have work great, because MQTT messages are delivered within milliseconds. Having a one second delay between two MQTT messages seems safe 😉

As the DelayedBatchProccesing class is also used in the Home Assistant extension, I would really like to hear @jochen314's opinion about the proposed changes 👍 A review is also requested for @cpmeister, @J-N-K and @cweitkamp.

Closes #5958

Signed-off-by: Aitor Iturrioz [email protected]

Signed-off-by: Aitor Iturrioz <[email protected]>
@bodiroga bodiroga requested a review from davidgraeff as a code owner May 24, 2020 01:31
Signed-off-by: Aitor Iturrioz <[email protected]>
@TravisBuddy
Copy link

Travis tests were successful

Hey @bodiroga,
we found no major flaws with your code. Still you might want to look at this logfile, as we usually suggest some optional improvements.

*
* @param t An object
*/
@Override
public void accept(T t) {
synchronized public void accept(T t) {
Copy link
Contributor

@cpmeister cpmeister May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this method to be called concurrently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sir, I have seen more than on scheduler been created with a single Homie device and I can't synchronize on the future field because it is nullable 😉

Comment on lines 68 to 70
if (scheduledFuture != null && !scheduledFuture.isDone()) {
scheduledFuture.cancel(true);
}
Copy link
Contributor

@cpmeister cpmeister May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a lot of potential concurrency issues around reassigning the future which mostly has to do with how this interacts with the other methods that cancel the future. I'm going to suggest a couple of changes which will make all of this thread-safe.

First, create a utility method for cancelling the future:

private static void cancel(@Nullable Future<?> future){
    Future<?> future = this.future;
    if(future != null){
        future.cancel(false);
    }
}

You will notice I don't bother checking if it is done or not since nothing bad will happen if you try to cancel an already finished future.

Then you will want to change
protected @Nullable ScheduledFuture<?> future;
to
protected final AtomicReference<@Nullable Future<?>> futureRef = new AtomicReference<>().

Then when you just want to cancel the future you call cancel(futureRef.getAndSet(null))
And when you want to schedule the future while canceling the old one you call cancel(futureRef.getAndSet(executor.schedule(this::run, delay, TimeUnit.MILLISECONDS)));

A few of the other methods would also change slightly.

    /**
     * Return true if there is a delayed processing going on.
     */
    public boolean isArmed() {
        ScheduledFuture<?> scheduledFuture = this.futureRef.get();
        return scheduledFuture != null && !scheduledFuture.isDone();
    }

Doing all this will allow you to safely remove the synchronized you added to accept as well as making all sure all of your other code is interacting with your future in an atomic way. I think using an AtomicReference here is more performant and less error prone than trying to basically synchronize every method in this class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for the detailed instructions @cpmeister, I will add them 👍 @jochen314, do you have anything to add? Do you agree with the proposed solution? I don't want to touch the DelayedBatchProccesing class without your approval 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 100% with @cpmeister here.
Personally I would even try a method like

private boolean setNewFuture(ScheduledFuture<?> newFuture) {
    Future<?> oldFuture = this.future.getAndSet(newFuture);
    boolean result = true;
    if(future != null){
        result = future.isDone();
        future.cancel(false);
    }
    return result;
}

And just one more thing:
There is the possibility that:

  1. The future times out
  2. We receive a new call to accept()

Does your caode then schedule another timeout-future?
I think, this should not be done. That's why my proposed method returns, if the old future was already done, so we could use that, not to trigger a new timeout, once a timeout occured.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jochen314!

First of all, many thanks for your message, it's good to see you here 😉 But... I don't understand what you mean with your last concerns.

  1. If the future times out, that means that no new messages have arrived in X seconds, so the run method is called.
  2. A new call to accept() just reschedules the future, resetting the process.

To make things more clear, the current DelayedBatchProccesing class behavior is:

  • The first time the accept() method is called, the future timer is set and new objects are collected when accept() is triggered. After the delay time has passed, the run method is called.

With the proposed changes, the new behavior will be:

  • The first time the accept() method is called, the future timer is set, but it is rescheduled when a new object is accepted. The run method will be called only after delay seconds have passed since the last message.

The new logic ensures that discovery processes that take longer than initially anticipated can be completed successfully, as it was suggested in #5963 (comment). Perhaps I'm explaining something that is obvious, but I just wanted to avoid any misunderstanding.

Anyway, tomorrow I will update the PR with the changes proposed by @cpmeister. Stay tuned!

@cpmeister cpmeister merged commit 632c947 into openhab:2.5.x May 26, 2020
@cpmeister cpmeister added the enhancement An enhancement or new feature for an existing add-on label May 26, 2020
@cpmeister cpmeister added this to the 2.5.6 milestone May 26, 2020
@jochen314
Copy link
Contributor

But then we are missusing the DelayedBatchProcessing for a scheduler.
The receiving Runnable does not care about the Objects received. (in the Homie case)

My normal understanding would have been, that the Runnable behind the DelayedBatchProcessing would use the collected elements and update every time called.

So, what was the problem with the short time out in the first place?

  • Some messages where recevied (not all) and HomieThingHandler.accept() was called.
  • Then more messages where recevied and HomieThingHandler.accept() should be called again updating the thing to the new state.

@bodiroga
Copy link
Author

But then we are missusing the DelayedBatchProcessing for a scheduler.

Ummmmm... I wouldn't be so "hard" on it, IMO the usage is the same, the "delay" time strategy is what have changed. It is a BatchProcessor that delays the return of the elements X seconds since the last one was added, not since the processor was created.

My normal understanding would have been, that the Runnable behind the DelayedBatchProcessing would use the collected elements and update every time called.

What do you mean by "every time called"? The DelayedBatchProcessor returns asynchronously, the caller just defines how much time it wants to wait until the results are returned. But with the new implementation, the caller still knows when the result will be returned, as he is the one that adds a new element to the batch.

So, what was the problem with the short time out in the first place?

Ok, you are right here, I suppose that the HomieThingHandler implementation could be changed to allow discovering the devices even when the information is not received at the same time, but I didn't consider it after reading @davidgraeff and @J-N-K's comments in #5963.

@cpmeister, you didn't read my previous comment #7760 (comment), did you? 😆

@mjcumming
Copy link

@cpmeister, just in case, don't merged the PR yet, I think that @J-N-K will want to increase the timeout values a little bit. It would also be awesome if @boc-tothefuture or @mjcumming could test the changes in their installations, as they use many Homie devices with openHAB 👍

@bodiroga happy to test - OpenHabian is failing to install the 2.5.6 testing release.

@boc-tothefuture
Copy link
Contributor

Is there a test binary to try?

@mjcumming
Copy link

In terms of MQTT messages, timing, Homie, I have a little more than 11000 retained messages that get sent over at start up. The Homie convention does not describe how long a controller should wait for all messages to be sent but does specify that the $state->ready is used to signal when the device is ready. Perhaps that should be the only time that OH complains about not receiving required messages in time?

@boc-tothefuture
Copy link
Contributor

@mjcumming - But you can't use $state as a signal because there is no specified order for topics to be delivered. Also, the homie device can be "ready" at all times without the controller connected.

@mjcumming
Copy link

Actually there is ordering of MQTT messages if QOS is the same. See 4.6

@boc-tothefuture
Copy link
Contributor

My understand is that applies to messages within a topic (which would obviously be necessary) but not the order of which to publish topics.

@mjcumming
Copy link

@boc-tothefuture - I have tried to clarify exactly how message ordering works. Is it the order messages were received by the broker, order for a topic and its sublevels, or order only for an exact topic match? My guess after reading this is that you are correct but the verbiage is still not completely clear...

@bodiroga
Copy link
Author

Hi guys! Thanks for all your comments, but regarding the message order, I always thought that the broker doesn't guarantee that retained messages are delivered in the same order they were sent. Although, to be honest, I haven't read the protocol specification.

About the possibility of doing some tests, would the homie jar be enough? As the PR is merged, what about using a snapshot docker build? Or are snapshot builds broken as stated in #7760 (comment)?

Best regards!

@mjcumming
Copy link

mjcumming commented May 30, 2020

@bodiroga 2.5.6~S131 seems to have solve the problem. 122 Homie devices discovered and added to the in box in about .6s on a PI 4

great work! thank you.

@bodiroga
Copy link
Author

Fantastic news @mjcumming, many thanks for the feedback! 👍

Dries-Vandenneucker added a commit to Dries-Vandenneucker/openhab-addons that referenced this pull request May 31, 2020
* 2.5.x: (174 commits)
  [hpprinter] Add additional data points and refactoring (openhab#7805)
  [neohub] new/legacy API; null annotations; enhancements; bugs; logging (openhab#7767)
  [meteoalerte] Initial contribution (openhab#7200)
  [lgwebos] Console command to show the access key (openhab#7801)
  [hue] Refactored state handling and fix polling after command (openhab#7518)
  [telegram] add attachment URL (openhab#7816)
  [siemensrds] readme adjusted to match openhab#7814 (openhab#7819)
  [lametrictime] correctly parse response (openhab#7818)
  [Seneye] Bug fix for using Pond or Home sensors. (openhab#7797)
  [siemensrds] apply UoM quantityType percent for relative humidity (openhab#7814)
  [alarmdecoder] Add vzone thing for virtual zone control (openhab#7800)
  [hue] Channel alert added for groups (openhab#7810)
  [hue] Keep compatibility with hue emulation for groups (openhab#7809)
  [dscalarm] Bridge/things management refactored (openhab#7748)
  [avmfritz] Add link to Fensterkontakt (magnetisch) to docs (openhab#7806)
  [deconz] add light/blind support and additional sensors (openhab#7608)
  [homekit] add support for min/max values for temperature (openhab#7782)
  [tesla] Use CXF JAX-RS client builder, if available (openhab#7804)
  [mqtt.homie] Improve Homie discovery time (openhab#7760)
  [siemensrds] null annotations; JUnit; UoM; enhancements; bug; refactoring; logging (openhab#7769)
  ...
@boc-tothefuture
Copy link
Contributor

About the possibility of doing some tests, would the homie jar be enough? As the PR is merged, what about using a snapshot docker build? Or are snapshot builds broken as stated in #7760 (comment)?

Best regards!

Is the updated homie jar available to download for testing? I am not running OpenHab in a container.

@mjcumming
Copy link

@boc-tothefuture the latest snapshot has it - I installed from Openhabian

@mjcumming
Copy link

@bodiroga would you have any interest in working developing the binding improvehomie device autodiscovery

LoungeFlyZ pushed a commit to LoungeFlyZ/openhab2-addons that referenced this pull request Jun 8, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
@bodiroga bodiroga deleted the fix-5958-timeout branch June 25, 2020 18:17
J-N-K pushed a commit to J-N-K/openhab-addons that referenced this pull request Jul 14, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
CSchlipp pushed a commit to CSchlipp/openhab-addons that referenced this pull request Jul 26, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
Signed-off-by: CSchlipp <[email protected]>
andrewfg pushed a commit to andrewfg/openhab-addons that referenced this pull request Aug 31, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
andrewfg pushed a commit to andrewfg/openhab-addons that referenced this pull request Aug 31, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
andrewfg pushed a commit to andrewfg/openhab-addons that referenced this pull request Aug 31, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
andrewfg pushed a commit to andrewfg/openhab-addons that referenced this pull request Aug 31, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
DaanMeijer pushed a commit to DaanMeijer/openhab-addons that referenced this pull request Sep 1, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
Signed-off-by: Daan Meijer <[email protected]>
markus7017 pushed a commit to markus7017/openhab-addons that referenced this pull request Sep 19, 2020
* Improve Homie discovery time

Signed-off-by: Aitor Iturrioz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature for an existing add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mqtt.homie] Decrease subscribeTimeout time for Homie devices
7 participants