Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 429 connecting to solcast on HA restart, zapped all solcast sensors #12

Closed
gcoan opened this issue Jun 5, 2024 · 107 comments
Closed
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gcoan
Copy link

gcoan commented Jun 5, 2024

Describe the bug
I restarted Home Assistant and when the Solcast integration started up it got an error 429 I believe from retrieving the rooftop sites info:
IMG_1030

This caused the rooftop sensor in HA to change to Unknown:
IMG_1032

And ALL the solcast PV forecast sensors to change to zero:
IMG_1031

Fortunately I spotted this pretty quickly and was able to resolve it by reloading the Solcast integration and all the sensors populated correctly

To Reproduce
Restart HA and if Solcast is busy its likely that the same will occur

Expected behavior
On HA start, if an error 429 occurs, use the previous solcast sensor values. Do not set any sensors to Unknown or zero.

Desktop (please complete the following information):
HA 2024.5.5
HAOS 12.3
Solcast v4.0.25

@BJReplay BJReplay added enhancement New feature or request help wanted Extra attention is needed and removed enhancement New feature or request labels Jun 5, 2024
@oleg-d
Copy link

oleg-d commented Jun 9, 2024

i don't know if this will help you but this helped me when i had the 429 error.

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

if you have automation to refresh data from solcast exactly on the hour then add a delay to it. for example i am using 2 min 45 secs delay on my automation that updates solcast data.

@gcoan
Copy link
Author

gcoan commented Jun 9, 2024

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

yes I have this in my solcast automation and I'm getting pretty reliable solcast polls now.

This problem occurred when Home Assistant was restarted (at a random time), the integration reached out to solcast to get the site data, got a 429 error and then that zero'd out all the solcast sensors. The enhancement request is to either prevent the initial solcast call, or trap the 429 error and not over
write the sensors if it occurs

@autoSteve
Copy link
Collaborator

autoSteve commented Jun 12, 2024

The enhancement request is to either prevent the initial solcast call, or trap the 429 error

I am looking into adding a configure option to suppress the initial call to Solcast... With two PV arrays I run out of available calls real quick doing restarts.

The initial call is not easily suppressed, as from what I can tell it returns the 'sites' along with forecast data. So I'll need to cache the sites data to survive the restart, which will result in data still showing up on the Energy dashboard right after restart. Doable, but it'll take time.

@oleg-d
Copy link

oleg-d commented Jun 12, 2024

One other integration I use has a toggle that you can enable/disable cloud update at integration startup and HA reboots. Obviously dont know if this could be translated over to this integration but worth mentioning.

For someone who knows about the technical side of things and python can check the PR here from the other integration to see if something can be borrowed into here.

@timverlander
Copy link

So I'll need to cache the sites data to survive the restart

Just an alternative thought.. as hobbyist accounts can only have two rooftops, it might be simpler to capture two additional user-entered resource_ids in the configuration flow and if either/both are present then skip the startup discovery call to Solcast and use the supplied resource_id(s) instead?

@autoSteve
Copy link
Collaborator

autoSteve commented Jun 13, 2024

I did some code examining only to find that the sites data is rigged to be definitely already cached by solcastapi.py. Then I did a few restarts of HA and checked API use count at solcast.com.au. It did not go up, so getting the sites must not contribute to API calls confirming the cache read of sites data.

However earlier today I did notice a 429 from Solcast after restarting, with forecast data unavailable in HA. The solar production yellow bars were also missing. Trying to check the log later did not have a 429 present because log clear on restart, so I need to wait until it happens again to see if it's the same issue reported at line 144.

An immediate restart of HA fixed the forecast and solar display.

So in summary, this should never happen on HA start with /config/sites.json (not solcast.json) present and the class variable apiCacheEnabled set True, because that section of code in the try/except block should read the cache and not do a REST call but apiCacheEnabled is always False.

Debug logging for individual components also seems to turn off on HA restart, so I might need to enable system-wide debug logging for a while. Ugh. I got logging sorted.

@autoSteve
Copy link
Collaborator

Delving further:

On startup, apiCacheEnabled is always False, hard-coded, so it's not caching the sites collection. It is caching forecasts only.

But it doesn't really need to cache sites collection, because it doesn't impact the count of daily API calls to do so. So no impact, you'd think? Kind of.

What I think is happening here is that Solcast limit API calls when things get too busy, probably only when you have a hobbiest account, and there is already much said by Oziee on that by guiding folks to not gather exactly on the hour. Many folks don't listen, so if your restart occurred near the top of the hour it might fail for that reason.

But it didn't. It failed at a quarter to five.

I am wondering whether some form of retry is in order on failure. Caching is also probably a better option so that the API call never happens. But what if sites change? Get added? This is probably why Oziee left it permanently disabled, to most likely cut down on the number of issues logged against his repository. Not everyone's a Python coder... And not everyone knows how to find and delete a file in Linux.

But he wrote the code to do so. It'd be amusing to try and make it work in the real world.

So I suspect that API limiting, because busy is behind this. I just did about twenty restarts in a row and none of them failed to get the sites from API, despite me having just one weird and seemingly random failure earlier today.

@autoSteve
Copy link
Collaborator

@gcoan, you could try out 490494c, which adds a retry mechanism, but testing my suspicion is going to be tough.

BJReplay added a commit that referenced this issue Jun 13, 2024
…for-up-to-1-minute

Add retry for rooftop sites collection #12
@autoSteve
Copy link
Collaborator

I am also going to expand on the retry concept, @gcoan. It seems sensible to me that if the rooftop sites collect fails, but there is a cache available to be read (because we restarted prior and all was good), then read it, log the circumstance with a warning, and move on.

I like that.

#26 just got merged. What could possibly go wrong? So it's another branch/PR for me to implement the belt/braces fall-back-on-cache.

@gcoan
Copy link
Author

gcoan commented Jun 13, 2024

@autoSteve yes that seems a sensible approach to use the cached sites data if we get a 429 error.

Do you want me to try the above release or is there another version coming with more changes?

@autoSteve
Copy link
Collaborator

Yeah, hold off @gcoan. I want the fall-back in there. Makes TOTAL sense.

@BJReplay
Copy link
Owner

I'm going to pull the pin for tonight (migraine kicking in to curl up and hide levels), but will (or you can, @autoSteve - welcome as a collaborator) build a release tomorrow (based on your fallback as noted above) when ready. Intention is to tidy up info.md as well (rather than just link to release notes) as well as update PR that requests this repo gets included in HACS (link to latest release and runs of HACS and hassfest actions.

I don't think I'm going to get around to implementing randomised polling before I travel, but I might also update the readme on making the randomised polling the first automation.

@autoSteve
Copy link
Collaborator

Ugh. I hate the flashy-flashy-migraines, @BJReplay. Much spekky, but. Until the pain kicks in... Growing out off them, and thankfully haven't had one for two years. (Not to mozz meself, so touch wood...)

Thanks for the welcome.

@gcoan
Copy link
Author

gcoan commented Jun 13, 2024

Would you believe it, looks like I had a power cut and HA restart last night, and Solcast 429 error on startup!
image

Same error, the sites entities are no longer being provided by solcast and all the other forecast entities are empty. Took two reloads before solcast came back, the first reload at 9:17:39 (that you would think would be a random enough time) failed with a 429.

Thanks @autoSteve and @BJReplay for looking into this. Hope you manage to hold off the migraine

@autoSteve
Copy link
Collaborator

Line 144. There she goes, @gcoan. That's what we're chasing...

@autoSteve
Copy link
Collaborator

b38c62a is good to test, @gcoan.

I'll hold off merging until someone other than me has tried it.

@autoSteve
Copy link
Collaborator

After a successful start you should find the cache (containing json) has been created, which will be used in subsequent failure scenarios.

image

@gcoan
Copy link
Author

gcoan commented Jun 13, 2024

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

It's just for other integrations I download beta versions from HACS and I'm not seeing any for solcast

image

@autoSteve
Copy link
Collaborator

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

Ummm... I've not got that clever. Do you know your way around Github to get from the commit file contents of solcastapi.py that's been modified and manually replace your current one with the one in the commit?

image

I may need to get more clever with beta releases...

@gcoan
Copy link
Author

gcoan commented Jun 13, 2024

Thanks @autoSteve I figured it out, finding the updated solcastapi.py file in the github release and copying that to /config/custom_components/solcast_solar (making a copy first!).

Restarted HA and the /config/sites.json file has been created and it appears to contain my sites data.
I have restarted HA 3 more times and on each restart I've not seen any solcast messages in the HA Core logfile, so not sure whether this means Solcast is not reporting 429 errors and so no retry/use of the cached sites.json is required, or whether its silently used the cached file. Should I see anything in the logfile so I can debug whether it is working or not?

Oh, Solcast entities remained OK throughout all these restarts

@autoSteve
Copy link
Collaborator

If a rooftop sites gather had been previously successful then the data from that gather will be used with a warning issued in the log should a 429 occur three times in a row with five second pauses between calls.

@autoSteve
Copy link
Collaborator

I have good news, and I have bad news.

So... the good news first, @gcoan. By restarting almost exactly on the hour I was able to replicate a 429 for get sites. The retry mechanism for getting sites data worked perfectly, warned of the occurrence in the log, and the integration used the cache data. So that bit of code checks out as tested.

image

The not-so-good news is that more caching needs to be done, with the call to GetUserUsageAllowance subsequently failing with a 429. This call is also not subject to a 10-call limit, pretty well confirming that this is only occurring at generally busy times for Solcast, and probably for hobbiest accounts only. This also needs a retry mechanism and separate cache to cope seamlessly with startup circumstances.

Back to some good news, though, should the call to GetUserUsageAllowance fail it will not interfere and have sensors go AWOL resulting in data gaps. It just makes my eye twitch, so it will be worked around...

It would also be interesting to potentially build in a much longer retry interval for getting forecast data periodically, say, one minute interval. If that coincidentally happens at a busy time, then I think that forecast update would be skipped until the next scheduled one occurs.

I'm on it. Problem number one first, then I'll move on to get forecast retries.

@gcoan
Copy link
Author

gcoan commented Jun 17, 2024

@autoSteve I fixed the crash in solcastapi.py, changing the code to:

        if self.get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}")

(added "self." to each of the get_last_updated_datetime() calls)

Did a manual run, its decided not to update solcast, not sure that is correct as I haven't updated (successfully) since 10:15
image

2024-06-17 13:38:09.712 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:38:09.712 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

Looking at the code I see it gets the 'last_updated' from itself. When I restarted HA to put this new code in, is this zero'd and so the calculation of 'its not 15 minutes elapsed' thus not working?
Either code for this or use sensor.solcast_pv_forecast_api_last_polled?

@gcoan
Copy link
Author

gcoan commented Jun 17, 2024

Belay that idea, the comparison isn't right somehow. Its 13:50, or 12:50 UTC. Last update was 10:15 (9:15 UTC). It appears to be checking against the correct last update time in UTC. I tried again and then left it so HA has been running 23 minutes but the comparison still fails:

2024-06-17 13:50:50.550 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:50:50.550 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00
2024-06-17 13:59:58.227 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:59:58.228 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

@autoSteve
Copy link
Collaborator

Much. need. focus.

I've just been working around an issue for version= that oziee got us into...

Epic failure logs. I should frame that as my biggest Python fail. self. is "duh"...

Yeah, something is odd. The comparison is to dt.now(timezone.utc), so what the? I might need to log both the last, and the comparison time that happens.

@autoSteve
Copy link
Collaborator

autoSteve commented Jun 17, 2024

Replace 590 with this. What do you get?

            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}, time now is {dt.now(timezone.utc)}")

@autoSteve
Copy link
Collaborator

("self." ... Gah.)

@autoSteve
Copy link
Collaborator

autoSteve commented Jun 17, 2024

OMG. It must be late.

Line 589, change < to >...

        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):

@gcoan
Copy link
Author

gcoan commented Jun 17, 2024

@autoSteve so why did the Python interpreter not understand what you meant. Too literal these computers. <, >, its all very similar really...

I changed the two lines as above, restarted HA, got 3 successive fails at reading the sites file so it fell back to using the cache and started up OK.

Ran my automation to pull the forecast, got a 429 error, waited 59 seconds, then worked fine this time updating my forecast ;-)

Ran it again a few minutes later, it correctly decided it didn't want to run:

2024-06-17 20:02:16.315 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-21 contains all 48 records
2024-06-17 20:02:16.316 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-22 contains all 48 records
2024-06-17 20:04:17.383 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139689409521136] from 195.166.158.163 (Home Assistant/2024.5.1 (io.robbie.HomeAssistant; build:2024.688; iPadOS 17.5.1)): Disconnected: Did not receive auth message within 10 seconds
2024-06-17 20:05:38.856 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 20:05:38.896 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 19:02:16.296654+00:00, time now is 2024-06-17 19:05:38.896497+00:00

Couple of suggestions from seeing the above behaviours:

  • Suggest that the pause and retry in the initial load of sites file be increased. At the moment its only a 5 second delay between the 3 retries which could well not be enough to let any Solcast overloading to have passed. I know this will further extend the startup time of the Solcast integration but something around 30-60 seconds between retries seems more as if it will give Solcast a chance to sort itself out

  • In the logging of the 'not less than 15 minutes have passed', the last run time is in UTC. I suspect people won't realise this is the case and we'll get spurious issues raised that 'the time is wrong'. Convert to local time zone? (I hate the different date/time routines, far too many options and need to think carefully what time structure you are passing to them)

@ProphetOfDoom
Copy link

ProphetOfDoom commented Jun 17, 2024

Just in case it helps:

  • Disabled my V4.0.23.1 install.
  • Cleared solcast json files from config directory.
  • Upgraded to V4.0.31.
  • Cleared logs.
  • Restarted HA.

Solcast started sweet as a nut.

  • All expected sensors populated.
  • System Option "Enable polling for updates stayed disabled.
  • Log shows
    2024-06-17 22:58:05.462 INFO (MainThread) [homeassistant.components.automation.solcast_pv_forecast] Initialized trigger Solcast PV Forecast
    2024-06-17 22:59:15.576 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
    2024-06-17 22:59:22.545 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.31
    2024-06-17 22:59:22.565 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
    2024-06-17 22:59:22.655 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
    2024-06-17 22:59:22.656 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Europe/London

System Info:
Core 2024.6.3
Supervisor 2024.06.0
Operating System 12.3
Frontend 20240610.1

Happy days

@autoSteve
Copy link
Collaborator

last run time is in UTC

This should localise it.

_LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime().astimezone(self._tz)}")

@gcoan
Copy link
Author

gcoan commented Jun 18, 2024

@autoSteve I messed about with the log messages a bit more to make them readable. Here's my final version of the code fragment which works perfectly, thanks

    async def http_data(self, dopast = False):
        """Request forecast data via the Solcast API."""
        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time now ({dt.now(timezone.utc).astimezone(self._tz)}) is within fifteen minutes of last forecast update at {self.get_last_updated_datetime().astimezone(self._tz)}")
            return

@autoSteve
Copy link
Collaborator

I had made almost exactly the same change, @gcoan. Just dropped the time now because that is already referenced on the log entry.

@autoSteve
Copy link
Collaborator

autoSteve commented Jun 19, 2024

My last forecast poll was very close to timing out... 366 seconds total, so very near the total async timeout of 480 seconds.

I have another idea for retries to further improve reliability.

Should the successive back-offs eventually fail, then possibly wait a further delay and try the back-off sequence again. No idea of the best way to actually implement that at the moment, because it is an HA automation that triggers Solcast collection.

I wonder whether somebody way smarter than me at automations could think about that with me.

The API Last Polled is exposed as a diagnostic, but no diagnostic info about the last failure to poll. If the last failure datetime were exposed, then I think writing an automation to trigger a retry after, say 20 minutes might be trivial. Randomness would not be required, because the prior attempt used randomness.

Logic like: if last failure to poll is greater than last successful poll, and twenty minutes has elapsed since last failure, then fire a call.

Thoughts?

2024-06-19 13:57:00.500 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:00.652 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 47 seconds before retry
2024-06-19 13:57:07.790 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:57:47.654 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:47.691 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 86 seconds before retry
2024-06-19 13:58:10.258 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Running automation actions
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Executing step call service
2024-06-19 13:59:13.693 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:59:13.787 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 96 seconds before retry
2024-06-19 13:59:50.048 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:00:49.788 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:00:49.874 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 138 seconds before retry
2024-06-19 14:00:59.326 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2024-06-19 14:00:59.879 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-06-19 14:00:59.881 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 2 entries during backup
2024-06-19 14:01:12.854 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:02:57.602 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:03:07.875 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 4 to 5
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200

@BJReplay
Copy link
Owner

Thoughts?

Trying not to over-think it (as I spin up oracle cloud instances as ampere availability becomes available again - woot); my inclination is to just skip a poll.

Here's why; (other than on first start up of the integration), you've got a forecast; it might be seven days out of date, but it's there.

It might not be up to date, but it's there.

Now, in my use case, for my actual battery control, I'm looking for near term forecasts to work out whether or not to fine tune my battery charging behaviour as I approach the start of expensive power - but really, that's just (as my old boss used to say) pencil f_#%ing: I know what my load is going to be, I know what my insolation is going to be ± cloud cover changes in the next hour or so, and any updated forecast is just going to slightly change my already planned charging activity slightly. Whether or not I get a new forecast won't change the fact that the wind might blow in a slightly different direction and make that updated forecast wrong in any case - so if I don't get it, I still might make the wrong decision.

Which is a really long way of saying: Try a few times, and if you can't get a result, skip that cycle.

Which is what my PV control algo has been doing for the last few years (blindly, mind you - it always uses the latest available forecast, and always tries to update the forecast, and doesn't know whether or not it got an updated forecast or not - it just always re-requests a forecast regularly through the day, and always re-adjusts charging targets based on remaining forecast insolation, load and state of charge).

@autoSteve
Copy link
Collaborator

From a practical point of view, I absolutely agree.

From an "I'm getting 429 error", so raise issue POV I think it has merit...

All the logging of retries is usually silently at debug level, so happens in the background.

@wimdebruyn
Copy link

wimdebruyn commented Jun 19, 2024

I poll roughly once per hour (but not on the hour to avoid 429's). Not each hour as the evening is more important for me to get regular updates. I have an "Allowed API use" sensor for specific times of the day. Here is the template:

    - name: "Solcast API usage max allowed"
      state: >
        {% set hours = [[0,48],[300,6],[400,10],[500,12],[600,16],[700,20],[1200,24],[1500,32],[2000,40]] %}
        {% set t = states('sensor.time').replace(':','')|int %}
        {% set ns = namespace(level = 0) %}
        {% for h in hours if t >= h[0] %}
          {% set ns.level = h[1] %}
        {% endfor %}
        {{ ns.level }}

It returns the current max API calls that should have been allowed up until the current time. I then opportunistically check in another automation with a 30min offset of my normal polling automation if I am allowed to do another poll (ie. if API's used < sensor.solcast_api_usage_max_allowed), and if so, I will poll again (whilst making sure you stay in your allowed usage). This will help if polling soon after 429's is important.

@BJReplay
Copy link
Owner

Yeah, thinking about it as I was just mowing our little strip of nature strip, I can see a change to the readme, which I'll add to your draft PR along the lines of:

As noted below, we suggest you automate calling solcast_solar.update_forecasts regularly through the day. Internally, solcast_solar.update_forecasts will attempt several retries if it gets a 429 busy response from the Solcast servers. However, if, after a number of attempts, it is unable to retrieve a forecast, it will log this in the logs (which you will see if you enable debug logging) but will otherwise fail silently. You will still have a forecast for the period (if one had been retrieved earlier) rather than a gap - but it will not be as up to date as you might have hoped.

It is up to you to build a belt and braces approach if it is absolutely essential for you to retrieve updated forecasts on a regular basis, bearing in mind that the integration already caches the forecasts that it has retrieved and these forecasts - even if out of date - are available for display and analysis.

@autoSteve
Copy link
Collaborator

but will otherwise fail silently

It retries silently, but fails loudly. 😂

@gcoan
Copy link
Author

gcoan commented Jun 19, 2024

but will otherwise fail silently

It retries silently, but fails loudly. 😂

I think we should make it clear that sensor.solcast_pv_forecast_api_last_polled contains the last successful poll time, so if it fails to poll due to 429's then this sensor time won't be updated.

I agree with @BJReplay thoughts about there being limited criticality of having an absolutely up to date forecast anyway, at best its just a forecast and clouds and stuff happens that means that its never going to be 100% accurate. I update the forecast at 6am, 10am, 1pm, 6pm and 11pm to give Predbat (my battery automation software) the best chance to include a more refined view for start of solar day/afternoon peak/what tomorrow's solar will be etc, but if I miss an update its not the end of the world.
[As an aside I find I almost never hit the PV 50% estimate, it's over-optimistic and I usually achieve somewhere around 30% of the way between the PV 10% and PV50% estimates. Having 3 different arrays on East/West sides of the roof probably is a key factor in this, the forecast is thus a composite sum.]

But failing silently to do something that was requested doesn't feel good practice to me and I do feel we ought to log the failure in an accessible way so that those that feel so inclined can automate using the information.
We have sensor.solcast_pv_forecast_api_last_polled which is the last successful poll time, how about adding two attributes, last_poll_successful (True/False to indicate whether it did retrieve OK or not) and last_poll_time (time of actual last poll regardless of whether it was successful or not).

@autoSteve
Copy link
Collaborator

But failing silently to do something that was requested

Like I said, it fails loudly with an error, but retries at debug log level. Folks would have no idea that it had actually thoughtfully retried before the failure occurred unless debug logging is on. I could log the retries about Solcast being busy as warnings if you both feel that that's better.

Last Poll Time confuses me. We have one that's already called called API Last Polled, and can't rename it (it should be called something like Forecast Last Received), so another name for Last Poll Time I think. API Last Attempted Poll???

The logic would then go something like, if last poll not successful and current time = last attempted poll time + some minutes then trigger the poll.

@swests
Copy link

swests commented Jun 19, 2024

Why not add another sensor to the integration that return a status, e.g. last_result {OK, CACHE, ERROR}.
OK = got fresh data
CACHE = Using the cached data
ERROR = No cache, or another API error.

This way it isn’t a silent fail and further (debug) investigation is needed…

@autoSteve
Copy link
Collaborator

ERROR = No cache, or another API error

Technically 'no cache' is not an error in my mind, @swests. A user either instructed the integration to delete it, or it's a fresh install. Either way, a successful poll then needs to occur for forecast data to be usable, and that is not being in an error state, just limbo until something happens.

@BJReplay
Copy link
Owner

What we're getting into here is the rapid spiral into home manualisation - that is that home automation that ought to be saving time is costing time as we spend more and more time working on edge cases that deliver no additional value, other than the asymptotic ideal of perfection. I'm after engineering completion here - NEIGE.

That doesn't mean that there's a relatively simple solution able to be cobbled together here, but more sensors and more logic to determine what each combination means is more complexity and more bug reports that aren't bugs.

Last Solcast Forecast Retrieved Time (or similar) tells you when you last actually retrieved something.
Last Solcast Forecast Attempted Time (or similar) tells you when you last attempted to retrieve something.

@gcoan
Copy link
Author

gcoan commented Jun 19, 2024

Like I said, it fails loudly with an error, but retries at debug log level. Folks would have no idea that it had actually thoughtfully retried before the failure occurred unless debug logging is on. I could log the retries about Solcast being busy as warnings if you both feel that that's better.

Last Poll Time confuses me. We have one that's already called called API Last Polled, and can't rename it (it should be called something like Forecast Last Received), so another name for Last Poll Time I think. API Last Attempted Poll???

The logic would then go something like, if last poll not successful and current time = last attempted poll time + some minutes then trigger the poll.

Thinking about it, the retries probably shouldn't be totally silent to normal users who don't have debug turned on. Should at least log a warning that it's retrying. Don't need all the lines that are in the debug but at least one message so 'normal users' can see in the log that solcast has had a 429 and it retried.

last_poll_time is I agree confusing and whilst we should rename the entity, it would be an undesirable breaking change. api_last_attempted_poll sounds fine, as long as we make clear the purpose in the docs.

@swests was trying to reduce proliferation of sensors wherever possible by adding extra attributes to an existing sensor where there's a common 'concept' such as 'polling information'. Just seems a cleaner way of doing it

@gcoan
Copy link
Author

gcoan commented Jun 19, 2024

Last Solcast Forecast Retrieved Time (or similar) tells you when you last actually retrieved something.
Last Solcast Forecast Attempted Time (or similar) tells you when you last attempted to retrieve something.

that'd work. assuming as attributes of existing sensor rather than yet more new sensors.

I think there's value in exposing the information so if others want to automate with it they can do, rather than writing lots of complex edge cases as you say within the integration or hiding it as if we know better

@autoSteve
Copy link
Collaborator

Logging improved, BTW.

  • Info for forecast gather start
  • Warning for retry in xx seconds
  • Error for ultimate 429 failure, post retries
  • Info for successful gather

@autoSteve
Copy link
Collaborator

And it might be a silly idea from the point of view of: Add another sensor? Then add another six language translations for it, where Google may or may not get it right... My Polish and Slovak are a bit rusty, and Germans have five words for some things 😂

autoSteve added a commit that referenced this issue Jun 21, 2024
Force all cache files to /config #43, track all account usage #42, prevent rapid forecast re-call #12
@gcoan
Copy link
Author

gcoan commented Jun 21, 2024

I think we're all fixed on this one now, thanks @autoSteve

@gcoan gcoan closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants