-
-
Notifications
You must be signed in to change notification settings - Fork 32k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radiotherm thermostats keep losing connectivity #8404
Comments
I've been seeing a good bit of "taking longer than 10 seconds" but no "longer than the scheduled update interval 0:01:00." What model/firmware? |
Yes, there are some "taking longer than 10 seconds" messages right after reboot, but thermostats work during that time. They stop working when "longer than 0:01:00" messages start to flood the log. Models are Radiotherm CT50 and Filtrete 3M50, firmware version 1.04.84 on both, api_version 113, wlan_fw_version v10.105576. |
I tried setting "scan_interval" to 2, then 4 minutes, and now I'm getting "Updating radiotherm climate took longer than the scheduled update interval 0:04:00" every 4 minutes. When I try to change temperature in home assistant, I get "Update for climate.livingroom is already in progress". Could it be that some network request got stuck somehow and hass is still waiting for it to complete, meanwhile rejecting all the other requests to this device? How can something like this be diagnosed? |
That's what it seems like, maybe. When you restart HASS, does the errant behavior begin immediately? |
No, after few hours, usually <24. Btw, if I just restart HASS (by calling service homeassistant -> restart) it never comes back, so I need to reboot raspberry pi, only this way it comes back. Not sure if this is related, but also somewhat odd. |
FWIW: I had to make a static arp entry on my machine for the thermostat. I was seeing timeouts as my thermostat doesn't seem to talk much. That might help you. |
All my thermostats have static arp entries too, that does not help. There is another interesting data point. There are 11 lifx bulbs in my network, so I tried disabling lifx component in HA for a week (did not turn off the bulbs, but controlled them via app instead of HA). During that week connectivity loses were less frequent, thermostats worked for 3 or 4 days before one of them lost connectivity again. So looks like this might be dependent on the amount of network traffic, I guess. Is there any way to change default settings for polling, such as those 10 seconds it complains about in the log? Maybe I could reduce the amount of traffic this way. |
FYI - I'm pretty sure I know what the problem here is. The HA thermostat call makes 6-7 requests to the underlying radiotherm for data during update(). radiotherm does no caching, so each of those calls is a round trip to the thermostat. These thermostats are REALLY slow, and if the receive too many requests too quickly, they just drop a lot of them. The good news is that this is fixable - all the data that update() needs can be made in one request to the thermostat. IMO, the best solution would be for radiothem to implement a caching system or an update() method to set the attributes. But the author seems to be against that model. So the solutions seem to be to a) use a different library, b) write a new library (not that hard - there is very little to the API), c) update the HA radio thermostat code to decode the lower level data itself. I'll probably try and implement c) since it's the smallest delta to the current code some time this/next week. |
In looking at this problem, I think there are multiple issues in the current radiotherm code. The basic rule is to avoid all comm except in update() and keep that to a minimum.
1&2 are bugs which I'll work on making a PR for this weekend. 3&4 are features - I'll open a new issue and start working on those as well. The real issue is to fix 2). I have 2 radio thermostats set to update every 120 seconds. I ran a ~20 hour test with 3 different implementations. The first is the current code using the radiotherm library and I got 380 of the 10 second error messages averaging about 8-10/hour. The second test was changing the code to use the radiotherm library tstat raw accessor (1 call per update()) and I got 74 of the 10 second errors - about 10 for the first 13 hours, then a lot of them after that. The final test was changing the code to call the thermostat directly using the Python requests packages w/ a valid timeout and I got 1 timeout for the whole test. That makes it pretty clear to me that I should rewrite this class to just call the thermostat directly. It's only about a few more lines of code than the current class and seems to work much more efficiently. |
Not using a library for device communication is against the ground philosophy for device integration in home assistant. https://home-assistant.io/developers/code_review_platform/#6-communication-with-devicesservices |
@MartinHjelmare: Are you saying I'm forced to use that library just because it exists even though that library is what's causing the problem in first place? In this case, the "API" is basically a single URL get with json decode. Writing a new radio thermostat library just to wrap a couple of URL get/post calls is kind of crazy. There are tons of existing HA components that make exactly that kind of URL call already. Those seem fine so why is this one not fine? I'm not trying to be combative - it just seems like adding tons of work (or picking a very sub-optimal solution) when a very simple solution works fine doesn't seem like a good idea to me. |
@TD22057 First off, thanks for digging into this. I had stopping using this component due to reliability issues so i'm glad someone is working to make things better. I don't think @MartinHjelmare is suggesting you have to use the existing library, just that the preference for device integration is to keep anything device specific in an external library. You could try to submit a PR to the existing library to implement your changes or propose a switch to a new one you create. A main reason for this is to ensure Home Assistant gives back to the python community. By keeping device integrations in separate libraries others can benefit from the work we do on HA in their own projects. |
@mezz64 Sounds good. I was going to submit a PR with just the stability improvements and then work on adding the missing functionality (hold toggle, fan control, retry of failed set functions). But that new functionality might make a difference on deciding whether or not to factor out the code into a new radio therm library so I'll just go ahead and implement it and push it to my repo where it can be reviewed and we can make a decision on the best thing to do. |
OK - if anyone wants to help test the updated components/climate/radiotherm.py code, you can find it here or by grabbing the radio_thermostat_fix branch in this repository. I'm getting vastly fewer timeout warnings and I've added fan control which will show up in the GUI now. Timeouts seem to occur now when the thermostat is actively heating and cooling (~4/hour on mine). No idea why the thermostat is under more load then but I don't see any thing that can be done about it. |
I would love to help test but I'm using hass.io. Any idea how I can load the code? |
Got it working. I had to make a dir in the config folder /config/custom_components/climate. Your changes have fixed my connection issues. Before I would normally only get 3-4hrs of reading before it quit working. Its been working for the last 12hrs perfectly. Thanks for the improvements!! |
Awesome! But - I would caution you to give it longer. I noticed that when the AC is running, even my fixes start having time outs. Still an order of magnitude less than the original code but they happen. My conclusion is that the thermostat firmware is terrible. There must be something about holding the relay open that causes the http server to have trouble responding to requests. I haven't put in a pull request yet for the fixes because I'm going to try and replace the request calls w/ asynic IO calls and see if that helps. I also want to implement an automatic retry system - especially for any set() functions so that changing the temperature has a higher likelihood of working even if the first call times out. |
@TD22057 Thanks a lot for making these fixes! Are they available in any release yet? (Just catching up on this, didn't have time recently to play with HA.) |
@andyat Nope - I was waiting. But I should probably go ahead and push what I have because it does fix the issue and adds fan on/off support. I was going to try and also work on:
But it's probably better to push what I have now and work on those items in a separate PR. |
@TD22057 Will be great if you could push what you have, so that connectivity gets fixed. |
@TD22057 Saw your pull requests on this, great to see progress! By the way, since you are working on this already, would you mind adding the following to future work list?
|
@andyat The whole climate module is a little confusing. A lot of the functions sound similar and there isn't much in the way of docs explaining what the differences are. Many of the thermostats have the same modes but use slightly different terminology as well. There is "current_operation" and "operation_mode". It looks like those are the same thing which is the thermostat mode (heat/cool/off) not whether it's currently heating or cooling but I'm not sure. I'm not sure there is a current API for getting when it's actually on. The radiothermostat class has the active mode already in it. It's self._tstate which will be either STATE_IDLE, STATE_HEAT, or STATE_COOL. There just isn't any system for pushing that out to HA as far as I can see. I think there are couple of ways forward with that. Either the 1) thermostat API needs to be updated to have an active mode attribute or 2) the device should create extra binary sensors (one for heating on and one for cooling on), or 3) the device should create some new multi-state sensor that indicates the active mode. In any case, it would be nice to access that attribute from other places in HA and it would be nice if the detailed web view were updated to show when it's one - maybe highlighting the plot in blue or red to show active cooling and heating. |
Make sure you are running the latest version of Home Assistant before reporting an issue.
You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:
Home Assistant release (
hass --version
): 0.48.0Python release (
python3 --version
): Python 3.4.2Component/platform: Radiotherm
Description of problem: I have 2 radiotherm thermostats at home (Radiotherm CT50 and Filtrete 3M50). Both of them lose connectivity with hass intermittently every single day. Usually one of them loses connectivity, while the other keeps working, and the next day it could be vice versa. When it happens, temperature graph does not update and shows straight line; changing temperature actually works on thermostat itself, but not reflected in hass UI. The log is flooded with messages like this every minute:
2017-07-07 21:14:12 WARNING (MainThread) [homeassistant.components.climate] Updating radiotherm climate took longer than the scheduled update interval 0:01:00
and occasionally
2017-07-07 22:00:12 WARNING (MainThread) [homeassistant.helpers.entity] Update for climate.livingroom is already in progress
I logged in to hass machine (raspberry pi) and checked that both thermostats are pingable and work fine via their http interface via curl. Yet home assistant still thinks one of them is not working. Rebooting raspberry pi fixes the problem for few hours, and then it happens again. I do reboots every night, and still by the next evening one of the thermostats usually gets into this state.
Any idea how to debug and fix this? This has been happening for few weeks if not months with various frequency. At some point nightly reboots mostly helped, but still the problem is there and it would be great to understand it.
Expected: Home assistant should not "lose connectivity" with radiotherm thermostats. If an update takes long time (which is possible since thermostats are single-threaded and slow), it should wait some time and then retry the update.
Problem-relevant
configuration.yaml
entries and steps to reproduce:Traceback (if applicable):
Additional info:
The text was updated successfully, but these errors were encountered: