-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error 429 connecting to solcast on HA restart, zapped all solcast sensors #12
Comments
i don't know if this will help you but this helped me when i had the 429 error. wait around a day without making a call to solcast. and then refresh at a random time, not on the hour. if you have automation to refresh data from solcast exactly on the hour then add a delay to it. for example i am using 2 min 45 secs delay on my automation that updates solcast data. |
yes I have this in my solcast automation and I'm getting pretty reliable solcast polls now. This problem occurred when Home Assistant was restarted (at a random time), the integration reached out to solcast to get the site data, got a 429 error and then that zero'd out all the solcast sensors. The enhancement request is to either prevent the initial solcast call, or trap the 429 error and not over |
I am looking into adding a configure option to suppress the initial call to Solcast... With two PV arrays I run out of available calls real quick doing restarts. The initial call is not easily suppressed, as from what I can tell it returns the 'sites' along with forecast data. So I'll need to cache the sites data to survive the restart, which will result in data still showing up on the Energy dashboard right after restart. Doable, but it'll take time. |
One other integration I use has a toggle that you can enable/disable cloud update at integration startup and HA reboots. Obviously dont know if this could be translated over to this integration but worth mentioning. For someone who knows about the technical side of things and python can check the PR here from the other integration to see if something can be borrowed into here. |
Just an alternative thought.. as hobbyist accounts can only have two rooftops, it might be simpler to capture two additional user-entered resource_ids in the configuration flow and if either/both are present then skip the startup discovery call to Solcast and use the supplied resource_id(s) instead? |
I did some code examining only to find that the sites data is rigged to be However earlier today I did notice a 429 from Solcast after restarting, with forecast data unavailable in HA. The solar production yellow bars were also missing. Trying to check the log later did not have a 429 present because log clear on restart, so I need to wait until it happens again to see if it's the same issue reported at line 144. An immediate restart of HA fixed the forecast and solar display. So in summary, this should never happen on HA start with
|
Delving further: On startup, But it doesn't really need to cache sites collection, because it doesn't impact the count of daily API calls to do so. So no impact, you'd think? Kind of. What I think is happening here is that Solcast limit API calls when things get too busy, probably only when you have a hobbiest account, and there is already much said by Oziee on that by guiding folks to not gather exactly on the hour. Many folks don't listen, so if your restart occurred near the top of the hour it might fail for that reason. But it didn't. It failed at a quarter to five. I am wondering whether some form of retry is in order on failure. Caching is also probably a better option so that the API call never happens. But what if sites change? Get added? This is probably why Oziee left it permanently disabled, to most likely cut down on the number of issues logged against his repository. Not everyone's a Python coder... And not everyone knows how to find and delete a file in Linux. But he wrote the code to do so. It'd be amusing to try and make it work in the real world. So I suspect that API limiting, because busy is behind this. I just did about twenty restarts in a row and none of them failed to get the sites from API, despite me having just one weird and seemingly random failure earlier today. |
…for-up-to-1-minute Add retry for rooftop sites collection #12
I am also going to expand on the retry concept, @gcoan. It seems sensible to me that if the rooftop sites collect fails, but there is a cache available to be read (because we restarted prior and all was good), then read it, log the circumstance with a warning, and move on. I like that. #26 just got merged. What could possibly go wrong? So it's another branch/PR for me to implement the belt/braces fall-back-on-cache. |
@autoSteve yes that seems a sensible approach to use the cached sites data if we get a 429 error. Do you want me to try the above release or is there another version coming with more changes? |
Yeah, hold off @gcoan. I want the fall-back in there. Makes TOTAL sense. |
I'm going to pull the pin for tonight (migraine kicking in to curl up and hide levels), but will (or you can, @autoSteve - welcome as a collaborator) build a release tomorrow (based on your fallback as noted above) when ready. Intention is to tidy up info.md as well (rather than just link to release notes) as well as update PR that requests this repo gets included in HACS (link to latest release and runs of HACS and hassfest actions. I don't think I'm going to get around to implementing randomised polling before I travel, but I might also update the readme on making the randomised polling the first automation. |
Ugh. I hate the flashy-flashy-migraines, @BJReplay. Much spekky, but. Until the pain kicks in... Growing out off them, and thankfully haven't had one for two years. (Not to mozz meself, so touch wood...) Thanks for the welcome. |
Would you believe it, looks like I had a power cut and HA restart last night, and Solcast 429 error on startup! Same error, the sites entities are no longer being provided by solcast and all the other forecast entities are empty. Took two reloads before solcast came back, the first reload at 9:17:39 (that you would think would be a random enough time) failed with a 429. Thanks @autoSteve and @BJReplay for looking into this. Hope you manage to hold off the migraine |
Line 144. There she goes, @gcoan. That's what we're chasing... |
@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ? It's just for other integrations I download beta versions from HACS and I'm not seeing any for solcast |
Ummm... I've not got that clever. Do you know your way around Github to get from the commit file contents of solcastapi.py that's been modified and manually replace your current one with the one in the commit? I may need to get more clever with beta releases... |
Thanks @autoSteve I figured it out, finding the updated solcastapi.py file in the github release and copying that to /config/custom_components/solcast_solar (making a copy first!). Restarted HA and the /config/sites.json file has been created and it appears to contain my sites data. Oh, Solcast entities remained OK throughout all these restarts |
If a rooftop sites gather had been previously successful then the data from that gather will be used with a warning issued in the log should a 429 occur three times in a row with five second pauses between calls. |
I have good news, and I have bad news. So... the good news first, @gcoan. By restarting almost exactly on the hour I was able to replicate a 429 for get sites. The retry mechanism for getting sites data worked perfectly, warned of the occurrence in the log, and the integration used the cache data. So that bit of code checks out as tested. The not-so-good news is that more caching needs to be done, with the call to GetUserUsageAllowance subsequently failing with a 429. This call is also not subject to a 10-call limit, pretty well confirming that this is only occurring at generally busy times for Solcast, and probably for hobbiest accounts only. This also needs a retry mechanism and separate cache to cope seamlessly with startup circumstances. Back to some good news, though, should the call to GetUserUsageAllowance fail it will not interfere and have sensors go AWOL resulting in data gaps. It just makes my eye twitch, so it will be worked around... It would also be interesting to potentially build in a much longer retry interval for getting forecast data periodically, say, one minute interval. If that coincidentally happens at a busy time, then I think that forecast update would be skipped until the next scheduled one occurs. I'm on it. Problem number one first, then I'll move on to get forecast retries. |
@autoSteve I fixed the crash in solcastapi.py, changing the code to: if self.get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
_LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}") (added "self." to each of the get_last_updated_datetime() calls) Did a manual run, its decided not to update solcast, not sure that is correct as I haven't updated (successfully) since 10:15 2024-06-17 13:38:09.712 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts Looking at the code I see it gets the 'last_updated' from itself. When I restarted HA to put this new code in, is this zero'd and so the calculation of 'its not 15 minutes elapsed' thus not working? |
Belay that idea, the comparison isn't right somehow. Its 13:50, or 12:50 UTC. Last update was 10:15 (9:15 UTC). It appears to be checking against the correct last update time in UTC. I tried again and then left it so HA has been running 23 minutes but the comparison still fails: 2024-06-17 13:50:50.550 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts |
Much. need. focus. I've just been working around an issue for version= that oziee got us into... Epic failure logs. I should frame that as my biggest Python fail. Yeah, something is odd. The comparison is to |
Replace
|
("self." ... Gah.) |
OMG. It must be late. Line 589, change
|
@autoSteve so why did the Python interpreter not understand what you meant. Too literal these computers. <, >, its all very similar really... I changed the two lines as above, restarted HA, got 3 successive fails at reading the sites file so it fell back to using the cache and started up OK. Ran my automation to pull the forecast, got a 429 error, waited 59 seconds, then worked fine this time updating my forecast ;-) Ran it again a few minutes later, it correctly decided it didn't want to run:
Couple of suggestions from seeing the above behaviours:
|
Just in case it helps:
Solcast started sweet as a nut.
System Info: Happy days |
This should localise it.
|
@autoSteve I messed about with the log messages a bit more to make them readable. Here's my final version of the code fragment which works perfectly, thanks
|
I had made almost exactly the same change, @gcoan. Just dropped the time now because that is already referenced on the log entry. |
My last forecast poll was very close to timing out... 366 seconds total, so very near the total async timeout of 480 seconds. I have another idea for retries to further improve reliability. Should the successive back-offs eventually fail, then possibly wait a further delay and try the back-off sequence again. No idea of the best way to actually implement that at the moment, because it is an HA automation that triggers Solcast collection. I wonder whether somebody way smarter than me at automations could think about that with me. The Logic like: if last failure to poll is greater than last successful poll, and twenty minutes has elapsed since last failure, then fire a call. Thoughts?
|
Trying not to over-think it (as I spin up oracle cloud instances as ampere availability becomes available again - woot); my inclination is to just skip a poll. Here's why; (other than on first start up of the integration), you've got a forecast; it might be seven days out of date, but it's there. It might not be up to date, but it's there. Now, in my use case, for my actual battery control, I'm looking for near term forecasts to work out whether or not to fine tune my battery charging behaviour as I approach the start of expensive power - but really, that's just (as my old boss used to say) pencil f_#%ing: I know what my load is going to be, I know what my insolation is going to be ± cloud cover changes in the next hour or so, and any updated forecast is just going to slightly change my already planned charging activity slightly. Whether or not I get a new forecast won't change the fact that the wind might blow in a slightly different direction and make that updated forecast wrong in any case - so if I don't get it, I still might make the wrong decision. Which is a really long way of saying: Try a few times, and if you can't get a result, skip that cycle. Which is what my PV control algo has been doing for the last few years (blindly, mind you - it always uses the latest available forecast, and always tries to update the forecast, and doesn't know whether or not it got an updated forecast or not - it just always re-requests a forecast regularly through the day, and always re-adjusts charging targets based on remaining forecast insolation, load and state of charge). |
From a practical point of view, I absolutely agree. From an "I'm getting 429 error", so raise issue POV I think it has merit... All the logging of retries is usually silently at debug level, so happens in the background. |
I poll roughly once per hour (but not on the hour to avoid 429's). Not each hour as the evening is more important for me to get regular updates. I have an "Allowed API use" sensor for specific times of the day. Here is the template:
It returns the current max API calls that should have been allowed up until the current time. I then opportunistically check in another automation with a 30min offset of my normal polling automation if I am allowed to do another poll (ie. if API's used < sensor.solcast_api_usage_max_allowed), and if so, I will poll again (whilst making sure you stay in your allowed usage). This will help if polling soon after 429's is important. |
Yeah, thinking about it as I was just mowing our little strip of nature strip, I can see a change to the readme, which I'll add to your draft PR along the lines of: As noted below, we suggest you automate calling It is up to you to build a belt and braces approach if it is absolutely essential for you to retrieve updated forecasts on a regular basis, bearing in mind that the integration already caches the forecasts that it has retrieved and these forecasts - even if out of date - are available for display and analysis. |
It retries silently, but fails loudly. 😂 |
I think we should make it clear that I agree with @BJReplay thoughts about there being limited criticality of having an absolutely up to date forecast anyway, at best its just a forecast and clouds and stuff happens that means that its never going to be 100% accurate. I update the forecast at 6am, 10am, 1pm, 6pm and 11pm to give Predbat (my battery automation software) the best chance to include a more refined view for start of solar day/afternoon peak/what tomorrow's solar will be etc, but if I miss an update its not the end of the world. But failing silently to do something that was requested doesn't feel good practice to me and I do feel we ought to log the failure in an accessible way so that those that feel so inclined can automate using the information. |
Like I said, it fails loudly with an error, but retries at debug log level. Folks would have no idea that it had actually thoughtfully retried before the failure occurred unless debug logging is on. I could log the retries about Solcast being busy as warnings if you both feel that that's better.
The logic would then go something like, if last poll not successful and current time = last attempted poll time + some minutes then trigger the poll. |
Why not add another sensor to the integration that return a status, e.g. last_result {OK, CACHE, ERROR}. This way it isn’t a silent fail and further (debug) investigation is needed… |
Technically 'no cache' is not an error in my mind, @swests. A user either instructed the integration to delete it, or it's a fresh install. Either way, a successful poll then needs to occur for forecast data to be usable, and that is not being in an error state, just limbo until something happens. |
What we're getting into here is the rapid spiral into home manualisation - that is that home automation that ought to be saving time is costing time as we spend more and more time working on edge cases that deliver no additional value, other than the asymptotic ideal of perfection. I'm after engineering completion here - NEIGE. That doesn't mean that there's a relatively simple solution able to be cobbled together here, but more sensors and more logic to determine what each combination means is more complexity and more bug reports that aren't bugs.
|
Thinking about it, the retries probably shouldn't be totally silent to normal users who don't have debug turned on. Should at least log a warning that it's retrying. Don't need all the lines that are in the debug but at least one message so 'normal users' can see in the log that solcast has had a 429 and it retried.
@swests was trying to reduce proliferation of sensors wherever possible by adding extra attributes to an existing sensor where there's a common 'concept' such as 'polling information'. Just seems a cleaner way of doing it |
that'd work. assuming as attributes of existing sensor rather than yet more new sensors. I think there's value in exposing the information so if others want to automate with it they can do, rather than writing lots of complex edge cases as you say within the integration or hiding it as if we know better |
Logging improved, BTW.
|
And it might be a silly idea from the point of view of: Add another sensor? Then add another six language translations for it, where Google may or may not get it right... My Polish and Slovak are a bit rusty, and Germans have five words for some things 😂 |
I think we're all fixed on this one now, thanks @autoSteve |
Describe the bug
I restarted Home Assistant and when the Solcast integration started up it got an error 429 I believe from retrieving the rooftop sites info:
This caused the rooftop sensor in HA to change to Unknown:
And ALL the solcast PV forecast sensors to change to zero:
Fortunately I spotted this pretty quickly and was able to resolve it by reloading the Solcast integration and all the sensors populated correctly
To Reproduce
Restart HA and if Solcast is busy its likely that the same will occur
Expected behavior
On HA start, if an error 429 occurs, use the previous solcast sensor values. Do not set any sensors to Unknown or zero.
Desktop (please complete the following information):
HA 2024.5.5
HAOS 12.3
Solcast v4.0.25
The text was updated successfully, but these errors were encountered: