Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radiotherm thermostats keep losing connectivity #8404

Closed
andyat opened this issue Jul 8, 2017 · 22 comments · Fixed by #10437
Closed

Radiotherm thermostats keep losing connectivity #8404

andyat opened this issue Jul 8, 2017 · 22 comments · Fixed by #10437

Comments

@andyat
Copy link
Contributor

andyat commented Jul 8, 2017

Make sure you are running the latest version of Home Assistant before reporting an issue.

You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:

Home Assistant release (hass --version): 0.48.0

Python release (python3 --version): Python 3.4.2

Component/platform: Radiotherm

Description of problem: I have 2 radiotherm thermostats at home (Radiotherm CT50 and Filtrete 3M50). Both of them lose connectivity with hass intermittently every single day. Usually one of them loses connectivity, while the other keeps working, and the next day it could be vice versa. When it happens, temperature graph does not update and shows straight line; changing temperature actually works on thermostat itself, but not reflected in hass UI. The log is flooded with messages like this every minute:

2017-07-07 21:14:12 WARNING (MainThread) [homeassistant.components.climate] Updating radiotherm climate took longer than the scheduled update interval 0:01:00

and occasionally

2017-07-07 22:00:12 WARNING (MainThread) [homeassistant.helpers.entity] Update for climate.livingroom is already in progress

I logged in to hass machine (raspberry pi) and checked that both thermostats are pingable and work fine via their http interface via curl. Yet home assistant still thinks one of them is not working. Rebooting raspberry pi fixes the problem for few hours, and then it happens again. I do reboots every night, and still by the next evening one of the thermostats usually gets into this state.

Any idea how to debug and fix this? This has been happening for few weeks if not months with various frequency. At some point nightly reboots mostly helped, but still the problem is there and it would be great to understand it.

Expected: Home assistant should not "lose connectivity" with radiotherm thermostats. If an update takes long time (which is possible since thermostats are single-threaded and slow), it should wait some time and then retry the update.

Problem-relevant configuration.yaml entries and steps to reproduce:

climate:
  - platform: radiotherm
    hold_temp: true
    host:
      - 192.168.56.7
      - 192.168.56.8
  1. Restart machine with home assistant. Thermostats work fine.
  2. Wait few hours - repros mostly every day within 24 hours
  3. Look at the temperature graphs - temperature stops updating

Traceback (if applicable):

pi@hass:/home/homeassistant/.homeassistant $ curl http://192.168.56.8/tstat
{"temp":75.50,"tmode":1,"fmode":0,"override":1,"hold":1,"t_heat":56.00,"tstate":0,"fstate":0,"time":{"day":4,"hour":22,"minute":27},"t_type_post":0}
pi@hass:/home/homeassistant/.homeassistant $ curl http://192.168.56.7/tstat
{"temp":76.00,"tmode":1,"fmode":0,"override":0,"hold":1,"t_heat":61.00,"tstate":0,"fstate":0,"time":{"day":4,"hour":22,"minute":27},"t_type_post":0}

Additional info:

@andyat andyat changed the title Radiotherm thermostats keeps losing connectivity with home assistant Radiotherm thermostats keep losing connectivity with home assistant Jul 8, 2017
@aneisch
Copy link
Contributor

aneisch commented Jul 9, 2017

I've been seeing a good bit of "taking longer than 10 seconds" but no "longer than the scheduled update interval 0:01:00." What model/firmware?

@andyat
Copy link
Contributor Author

andyat commented Jul 9, 2017

Yes, there are some "taking longer than 10 seconds" messages right after reboot, but thermostats work during that time. They stop working when "longer than 0:01:00" messages start to flood the log. Models are Radiotherm CT50 and Filtrete 3M50, firmware version 1.04.84 on both, api_version 113, wlan_fw_version v10.105576.

@fabaff fabaff changed the title Radiotherm thermostats keep losing connectivity with home assistant Radiotherm thermostats keep losing connectivity Jul 11, 2017
@andyat
Copy link
Contributor Author

andyat commented Jul 12, 2017

I tried setting "scan_interval" to 2, then 4 minutes, and now I'm getting "Updating radiotherm climate took longer than the scheduled update interval 0:04:00" every 4 minutes. When I try to change temperature in home assistant, I get "Update for climate.livingroom is already in progress".

Could it be that some network request got stuck somehow and hass is still waiting for it to complete, meanwhile rejecting all the other requests to this device? How can something like this be diagnosed?

@aneisch
Copy link
Contributor

aneisch commented Jul 12, 2017

That's what it seems like, maybe. When you restart HASS, does the errant behavior begin immediately?

@andyat
Copy link
Contributor Author

andyat commented Jul 13, 2017

No, after few hours, usually <24. Btw, if I just restart HASS (by calling service homeassistant -> restart) it never comes back, so I need to reboot raspberry pi, only this way it comes back. Not sure if this is related, but also somewhat odd.

@aneisch
Copy link
Contributor

aneisch commented Jul 24, 2017

FWIW: I had to make a static arp entry on my machine for the thermostat. I was seeing timeouts as my thermostat doesn't seem to talk much. That might help you.

@andyat
Copy link
Contributor Author

andyat commented Jul 27, 2017

All my thermostats have static arp entries too, that does not help.

There is another interesting data point. There are 11 lifx bulbs in my network, so I tried disabling lifx component in HA for a week (did not turn off the bulbs, but controlled them via app instead of HA). During that week connectivity loses were less frequent, thermostats worked for 3 or 4 days before one of them lost connectivity again. So looks like this might be dependent on the amount of network traffic, I guess.

Is there any way to change default settings for polling, such as those 10 seconds it complains about in the log? Maybe I could reduce the amount of traffic this way.

@TD22057
Copy link
Contributor

TD22057 commented Sep 6, 2017

FYI - I'm pretty sure I know what the problem here is. The HA thermostat call makes 6-7 requests to the underlying radiotherm for data during update(). radiotherm does no caching, so each of those calls is a round trip to the thermostat. These thermostats are REALLY slow, and if the receive too many requests too quickly, they just drop a lot of them. The good news is that this is fixable - all the data that update() needs can be made in one request to the thermostat.

IMO, the best solution would be for radiothem to implement a caching system or an update() method to set the attributes. But the author seems to be against that model. So the solutions seem to be to a) use a different library, b) write a new library (not that hard - there is very little to the API), c) update the HA radio thermostat code to decode the lower level data itself.

I'll probably try and implement c) since it's the smallest delta to the current code some time this/next week.

@TD22057
Copy link
Contributor

TD22057 commented Sep 7, 2017

In looking at this problem, I think there are multiple issues in the current radiotherm code. The basic rule is to avoid all comm except in update() and keep that to a minimum.

  1. the setup_platform() will fail completely if it can't talk to the thermstat at start up. The fix for this is to not use the radiotherm class and store the thermostat host and attempt to talk to it at each update() call. This way if the thermostat comes back on line after HASS starts up, it will start working.

  2. update() is making a HTTP call to the thermostat 6-7 times per update() call. This can be reduced to 2 times in the first call (1 to get the name, 1 to get the state) and 1 call every time after that (to get the state). Each call is multi-seconds long because the thermostat is so slow.

  3. There is a call in the constructor to set_time() which talks to the thermostat and slows down the initial set up. This should be removed. In the long run, I'm going to add a config option for how often (probably in days) to update the time in the thermostat with 0 meaning to not update it at all.

  4. fan mode and hold mode are missing and can't be accessed through the HASS web interface.

1&2 are bugs which I'll work on making a PR for this weekend. 3&4 are features - I'll open a new issue and start working on those as well.

The real issue is to fix 2). I have 2 radio thermostats set to update every 120 seconds. I ran a ~20 hour test with 3 different implementations. The first is the current code using the radiotherm library and I got 380 of the 10 second error messages averaging about 8-10/hour. The second test was changing the code to use the radiotherm library tstat raw accessor (1 call per update()) and I got 74 of the 10 second errors - about 10 for the first 13 hours, then a lot of them after that. The final test was changing the code to call the thermostat directly using the Python requests packages w/ a valid timeout and I got 1 timeout for the whole test.

That makes it pretty clear to me that I should rewrite this class to just call the thermostat directly. It's only about a few more lines of code than the current class and seems to work much more efficiently.

@MartinHjelmare
Copy link
Member

Not using a library for device communication is against the ground philosophy for device integration in home assistant.

https://home-assistant.io/developers/code_review_platform/#6-communication-with-devicesservices

@TD22057
Copy link
Contributor

TD22057 commented Sep 8, 2017

@MartinHjelmare: Are you saying I'm forced to use that library just because it exists even though that library is what's causing the problem in first place? In this case, the "API" is basically a single URL get with json decode. Writing a new radio thermostat library just to wrap a couple of URL get/post calls is kind of crazy. There are tons of existing HA components that make exactly that kind of URL call already. Those seem fine so why is this one not fine? I'm not trying to be combative - it just seems like adding tons of work (or picking a very sub-optimal solution) when a very simple solution works fine doesn't seem like a good idea to me.

@mezz64
Copy link
Contributor

mezz64 commented Sep 8, 2017

@TD22057 First off, thanks for digging into this. I had stopping using this component due to reliability issues so i'm glad someone is working to make things better.

I don't think @MartinHjelmare is suggesting you have to use the existing library, just that the preference for device integration is to keep anything device specific in an external library. You could try to submit a PR to the existing library to implement your changes or propose a switch to a new one you create. A main reason for this is to ensure Home Assistant gives back to the python community. By keeping device integrations in separate libraries others can benefit from the work we do on HA in their own projects.

@TD22057
Copy link
Contributor

TD22057 commented Sep 8, 2017

@mezz64 Sounds good. I was going to submit a PR with just the stability improvements and then work on adding the missing functionality (hold toggle, fan control, retry of failed set functions). But that new functionality might make a difference on deciding whether or not to factor out the code into a new radio therm library so I'll just go ahead and implement it and push it to my repo where it can be reviewed and we can make a decision on the best thing to do.

@TD22057
Copy link
Contributor

TD22057 commented Sep 9, 2017

OK - if anyone wants to help test the updated components/climate/radiotherm.py code, you can find it here or by grabbing the radio_thermostat_fix branch in this repository.

I'm getting vastly fewer timeout warnings and I've added fan control which will show up in the GUI now. Timeouts seem to occur now when the thermostat is actively heating and cooling (~4/hour on mine). No idea why the thermostat is under more load then but I don't see any thing that can be done about it.

@einstein2883
Copy link

I would love to help test but I'm using hass.io. Any idea how I can load the code?

@einstein2883
Copy link

Got it working. I had to make a dir in the config folder /config/custom_components/climate. Your changes have fixed my connection issues. Before I would normally only get 3-4hrs of reading before it quit working. Its been working for the last 12hrs perfectly. Thanks for the improvements!!

@TD22057
Copy link
Contributor

TD22057 commented Sep 14, 2017

Awesome! But - I would caution you to give it longer. I noticed that when the AC is running, even my fixes start having time outs. Still an order of magnitude less than the original code but they happen. My conclusion is that the thermostat firmware is terrible. There must be something about holding the relay open that causes the http server to have trouble responding to requests.

I haven't put in a pull request yet for the fixes because I'm going to try and replace the request calls w/ asynic IO calls and see if that helps. I also want to implement an automatic retry system - especially for any set() functions so that changing the temperature has a higher likelihood of working even if the first call times out.

@andyat
Copy link
Contributor Author

andyat commented Oct 7, 2017

@TD22057 Thanks a lot for making these fixes! Are they available in any release yet? (Just catching up on this, didn't have time recently to play with HA.)

@TD22057
Copy link
Contributor

TD22057 commented Oct 7, 2017

@andyat Nope - I was waiting. But I should probably go ahead and push what I have because it does fix the issue and adds fan on/off support. I was going to try and also work on:

  • use async io http requests
  • add an auto-retry feature if a set function call fails (I really hate that it fails silently to set the temp every once in awhile and has to be re-called manually)
  • fix auto mode which doesn't work at all right now
  • add a hold toggle to the class and web UI instead of the always hold/never hold system in place now

But it's probably better to push what I have now and work on those items in a separate PR.

@andyat
Copy link
Contributor Author

andyat commented Oct 11, 2017

@TD22057 Will be great if you could push what you have, so that connectivity gets fixed.

@andyat
Copy link
Contributor Author

andyat commented Nov 8, 2017

@TD22057 Saw your pull requests on this, great to see progress!

By the way, since you are working on this already, would you mind adding the following to future work list?

  • Fix "state" to reflect current state accurately. Currently thermostat has the following 3 properties: "state", "mode", "operation_mode", all of them always set to "heat", even when heater is not running. I think probably mode/operation_mode should stay as it is, but "state" should reflect current state, such as "heat", "idle", "cool" of "off". Or maybe add another property if "state" is not good for that. Currently there is no way to know when heater is running and when it is idle. Thoughts?

@TD22057
Copy link
Contributor

TD22057 commented Nov 8, 2017

@andyat The whole climate module is a little confusing. A lot of the functions sound similar and there isn't much in the way of docs explaining what the differences are. Many of the thermostats have the same modes but use slightly different terminology as well. There is "current_operation" and "operation_mode". It looks like those are the same thing which is the thermostat mode (heat/cool/off) not whether it's currently heating or cooling but I'm not sure. I'm not sure there is a current API for getting when it's actually on.

The radiothermostat class has the active mode already in it. It's self._tstate which will be either STATE_IDLE, STATE_HEAT, or STATE_COOL. There just isn't any system for pushing that out to HA as far as I can see.

I think there are couple of ways forward with that. Either the 1) thermostat API needs to be updated to have an active mode attribute or 2) the device should create extra binary sensors (one for heating on and one for cooling on), or 3) the device should create some new multi-state sensor that indicates the active mode. In any case, it would be nice to access that attribute from other places in HA and it would be nice if the detailed web view were updated to show when it's one - maybe highlighting the plot in blue or red to show active cooling and heating.

@home-assistant home-assistant locked and limited conversation to collaborators Mar 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants