Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[netatmo] API limit reached handling #16489

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

clinique
Copy link
Contributor

@clinique clinique commented Mar 5, 2024

Distinct timing (60 minutes) when API LIMIT REACHED has been issued.

Partially resolves issue #16485

@clinique clinique self-assigned this Mar 5, 2024
@clinique clinique requested a review from lolodomo as a code owner March 5, 2024 13:50
Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was fast! Thanks for taking care. I will take a closer look at my logs, your fixes and also give it a run on my production system.

@clinique
Copy link
Contributor Author

clinique commented Mar 5, 2024

Wow, that was fast! Thanks for taking care. I will take a closer look at my logs, your fixes and also give it a run on my production system.

You were lucky enough that I had some spare time today - I did not launch eclipse since around 3 month.

Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported your changes to 4.1.x and put it on my production system. All Things went offline pretty quickly - the API bridge with: "Request timed out - will attempt reconnection later".

Does it work well for you?

@jlaur jlaur added bug An unexpected problem or unintended behavior of an add-on additional testing preferred The change works for the pull request author. A test from someone else is preferred though. labels Mar 6, 2024
@jlaur jlaur added the work in progress A PR that is not yet ready to be merged label Mar 9, 2024
@clinique clinique force-pushed the netatmo_16485 branch 2 times, most recently from 07c499b to f420e73 Compare March 16, 2024 08:20
@clinique clinique requested a review from jlaur March 16, 2024 08:21
@clinique
Copy link
Contributor Author

@jlaur : if you can test on this base. I've reached something that seems pretty strong on heavy workload and normally solved the ghost process issue.

@jlaur jlaur removed the work in progress A PR that is not yet ready to be merged label Mar 16, 2024
@jlaur
Copy link
Contributor

jlaur commented Mar 16, 2024

@jlaur : if you can test on this base. I've reached something that seems pretty strong on heavy workload and normally solved the ghost process issue.

Thanks! I've backported your changes to 4.1.x and just put the binding on my production system. It started without issues, but quickly after I noticed this:

2024-03-16 12:41:35.835 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler of thing netatmo:account:home tried checking if channel monitoring#request-count is linked although the handler was already disposed.

and:

java.lang.ClassCastException: Cannot cast org.openhab.binding.netatmo.internal.handler.ModuleHandler to org.openhab.binding.netatmo.internal.handler.CommonInterface
	at java.lang.Class.cast(Class.java:3889) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:?]
	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616) ~[?:?]
	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622) ~[?:?]
	at java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.getActiveChildren(CommonInterface.java:156) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.updateReadings(CommonInterface.java:213) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.proceedWithUpdate(CommonInterface.java:207) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.capability.RefreshCapability.proceedWithUpdate(RefreshCapability.java:83) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.capability.RefreshCapability.lambda$3(RefreshCapability.java:124) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

I'm not sure if the exception is related to the undisposed task from the previous version. In that case, it can be ignored.

I have now restarted openHAB and will let the binding run for some time, and I will also start analyzing your changes in relation to the experienced issues and see if I can figure out which changes addresses which issues. 🙂

@clinique clinique requested a review from jlaur March 18, 2024 08:33
@jlaur
Copy link
Contributor

jlaur commented Mar 18, 2024

@clinique - strangely, if I grep "Module refreshed, next one in" /var/log/openhab/openhab.log, I see on only three minutes poll interval since yesterday when restarting the binding:

2024-03-18 22:28:30.940 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:28:31.065 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:33.954 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:34.200 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.491 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.664 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s

Also, I do not see anything logged regarding probing, i.e. "Data validity period identified to be" nor "Data validity period not yet found, data timestamp unchanged".

I cannot rule out that I made some mistake when backporting the changes, although I went over everything a few times. Do you get different results?

@clinique
Copy link
Contributor Author

clinique commented Mar 18, 2024

@clinique - strangely, if I grep "Module refreshed, next one in" /var/log/openhab/openhab.log, I see on only three minutes poll interval since yesterday when restarting the binding:

2024-03-18 22:28:30.940 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:28:31.065 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:33.954 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:34.200 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.491 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.664 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s

Also, I do not see anything logged regarding probing, i.e. "Data validity period identified to be" nor "Data validity period not yet found, data timestamp unchanged".

I cannot rule out that I made some mistake when backporting the changes, although I went over everything a few times. Do you get different results?

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.
Ok, got it.

@jlaur
Copy link
Contributor

jlaur commented Mar 19, 2024

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.

I see you removed the default value of 180 seconds. But should it even have this configuration parameter? According to the documentation this only applies to the home Thing: https://www.openhab.org/addons/bindings/netatmo/#list-of-supported-things

@clinique
Copy link
Contributor Author

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.

I see you removed the default value of 180 seconds. But should it even have this configuration parameter? According to the documentation this only applies to the home Thing: https://www.openhab.org/addons/bindings/netatmo/#list-of-supported-things

Yes, it's only a quick fix. A change has been introduced here that created this issue. I'll take some to think of it and partially revert or correct PR #16492

@clinique
Copy link
Contributor Author

@jlaur : can you give a test or your feed-back on this ? I rebased with last merges.

@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

@jlaur : can you give a test or your feed-back on this ? I rebased with last merges.

Yes, I've had it running in production since Thursday after #16546 was merged. I'm currently running some tests after your latest commit today:

Additional module:

  • Take out batteries and wait for it to go offline (when API returns "reachable": false). 🟢
  • Insert batteries again and verify it goes back online. 🟢

Main module:

  • Cut power and wait for it to go offline (when last timestamp is more than one hour ago). 🟢
  • Power on again and verify it goes back online. 🔴

Restart binding:

  • Verify number of calls (no disposed handlers still working). 🟢

Simulate API responses:

  • OAuth authentication call failed. 🔴
  • OAuth authentication call failed twice. 🔴
  • Normal call failed. 🔴
  • HTTP code 429. 🟢

Currently I'm checking the results of main module getting back online which doesn't seem to work.

For the last three tests I have provided a new console command for simulating exceptions and HTTP status codes.

I will keep you updated on the results.

@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

On first OAuth failure there are two status updates:

2024-03-23 16:42:16.408 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`.
2024-03-23 16:42:16.420 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`. to OFFLINE (COMMUNICATION_ERROR)

Retry is correctly scheduled:

2024-03-23 16:42:16.416 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 300 seconds

After second failure:

2024-03-23 16:47:16.470 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from OFFLINE (COMMUNICATION_ERROR) to OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`.

and it doesn't recover after that.

jlaur added a commit to jlaur/openhab-addons that referenced this pull request Mar 23, 2024
Signed-off-by: Jacob Laursen <[email protected]>
@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

For the last three tests I have provided a new console command for simulating exceptions and HTTP status codes.

In case you are interested, see 62ae29d.

@clinique clinique deleted the netatmo_16485 branch March 30, 2024 10:00
@jlaur
Copy link
Contributor

jlaur commented Apr 1, 2024

@clinique - why did you close this PR?

@clinique clinique restored the netatmo_16485 branch April 1, 2024 15:28
@clinique clinique reopened this Apr 1, 2024
@clinique
Copy link
Contributor Author

clinique commented Apr 1, 2024

@clinique - why did you close this PR?

I don't know. Probably a mistake, I made some cleaning in my repo...sorry.

@jlaur
Copy link
Contributor

jlaur commented Apr 1, 2024

@clinique - I'm now back to this.

Can I ask how you tested this? I'm asking because I have applied my "console error simulator", and I immediately ran into an issue when being authenticated already, but then hitting an HTTP error code 429:

2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor_small_bathroom to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.738 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor3 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.740 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor2 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.742 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 3600 seconds
2024-04-01 21:11:11.742 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor:outdoor to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_office to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_alva to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_linus to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.745 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data '32': 
2024-04-01 21:11:11.748 [DEBUG] [handler.capability.RefreshCapability] - Thing 'netatmo:weather-station:home:indoor' is not ONLINE, using special refresh interval
2024-04-01 21:11:11.750 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor' next refresh in PT15M
2024-04-01 21:11:11.740 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Maximum usage reached, will reconnect in 3600 seconds.
2024-04-01 21:11:11.746 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.748 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.752 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.753 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.754 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.756 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.757 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:26:11.751 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor

(and then everything came back online)

Then problem I see here is that we seem not to be protected at all from worsening after MAXIMUM_USAGE_REACHED or HTTP error code 429, since the child handlers are still permitted to call the API, and are actively doing so.

Looking at the code, I think the "API limiting" is only for the corner case of missing authentication or request errors?

Maybe I'm testing it the wrong way or misunderstood the intentions.

EDIT: It seems that after this occurred, API calls are still made periodically, but nothing is updated anymore. Example:

2024-04-01 21:45:47.840 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.842 [DEBUG] [l.handler.capability.CacheCapability] - WeatherCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.843 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=46&get_favorites=false 
2024-04-01 21:45:47.850 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx 
2024-04-01 21:45:48.166 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [200 OK] body [...]
2024-04-01 21:45:48.170 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_office
2024-04-01 21:45:48.172 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed
2024-04-01 21:45:48.173 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_alva
2024-04-01 21:45:48.174 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_linus
2024-04-01 21:45:48.183 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor_small_bathroom' next refresh in PT10M4.816913174S

@clinique
Copy link
Contributor Author

clinique commented Apr 2, 2024

@clinique - I'm now back to this.

Can I ask how you tested this? I'm asking because I have applied my "console error simulator", and I immediately ran into an issue when being authenticated already, but then hitting an HTTP error code 429:

2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor_small_bathroom to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.738 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor3 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.740 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor2 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.742 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 3600 seconds
2024-04-01 21:11:11.742 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor:outdoor to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_office to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_alva to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_linus to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.745 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data '32': 
2024-04-01 21:11:11.748 [DEBUG] [handler.capability.RefreshCapability] - Thing 'netatmo:weather-station:home:indoor' is not ONLINE, using special refresh interval
2024-04-01 21:11:11.750 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor' next refresh in PT15M
2024-04-01 21:11:11.740 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Maximum usage reached, will reconnect in 3600 seconds.
2024-04-01 21:11:11.746 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.748 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.752 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.753 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.754 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.756 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.757 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:26:11.751 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor

(and then everything came back online)

Then problem I see here is that we seem not to be protected at all from worsening after MAXIMUM_USAGE_REACHED or HTTP error code 429, since the child handlers are still permitted to call the API, and are actively doing so.

Looking at the code, I think the "API limiting" is only for the corner case of missing authentication or request errors?

Maybe I'm testing it the wrong way or misunderstood the intentions.

EDIT: It seems that after this occurred, API calls are still made periodically, but nothing is updated anymore. Example:

2024-04-01 21:45:47.840 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.842 [DEBUG] [l.handler.capability.CacheCapability] - WeatherCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.843 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=46&get_favorites=false 
2024-04-01 21:45:47.850 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx 
2024-04-01 21:45:48.166 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [200 OK] body [...]
2024-04-01 21:45:48.170 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_office
2024-04-01 21:45:48.172 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed
2024-04-01 21:45:48.173 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_alva
2024-04-01 21:45:48.174 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_linus
2024-04-01 21:45:48.183 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor_small_bathroom' next refresh in PT10M4.816913174S

I'll be back on this one very soon. I'm currently focused on another PR I had open since a while.

@lolodomo
Copy link
Contributor

@clinique @jlaur : What is the status here ?
Is it work on progress or finished and ready for a review?

@clinique
Copy link
Contributor Author

@clinique @jlaur : What is the status here ?
Is it work on progress or finished and ready for a review?

Unfinished - I have to get back on it

@lolodomo lolodomo added the work in progress A PR that is not yet ready to be merged label Apr 19, 2024
@clinique
Copy link
Contributor Author

@jlaur : If you are ready to reactivate your console error simulator, I'm ready to give this a new push.

@jlaur
Copy link
Contributor

jlaur commented Sep 29, 2024

If you are ready to reactivate your console error simulator, I'm ready to give this a new push.

I will give it another look within the next week. In the meantime you could also give it a spin yourself: 62ae29d. 😉 What is the current state of the PR, perhaps you can give a short update in relation to this comment?

Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet tested it, did you try emulating some error conditions? Also it seems like this comment is not yet addressed? I think when we hit 429, we should make sure we don't call the service at all for some time.

@clinique
Copy link
Contributor Author

I have not yet tested it, did you try emulating some error conditions? Also it seems like this comment is not yet addressed? I think when we hit 429, we should make sure we don't call the service at all for some time.

When we have a 429 the bridge will be put offline (then all childs) and a new connection will be tried in 3600s. Is this not enough or am I missing something in your comment ?

Improve handling of interrupted request by alllowing retries
Rebased
Reintroducing TOO_MANY_REQUESTS

Signed-off-by: clinique <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Gaël L'hopital <[email protected]>
@jlaur
Copy link
Contributor

jlaur commented Oct 30, 2024

I have not yet tested it, did you try emulating some error conditions? Also it seems like this comment is not yet addressed? I think when we hit 429, we should make sure we don't call the service at all for some time.

When we have a 429 the bridge will be put offline (then all childs) and a new connection will be tried in 3600s. Is this not enough or am I missing something in your comment ?

It's all in the referenced comment. I don't see anything related changed since making that comment, so I just assumed it would still be an issue. I need to get my focus back to this topic and run the tests again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
additional testing preferred The change works for the pull request author. A test from someone else is preferred though. awaiting feedback bug An unexpected problem or unintended behavior of an add-on work in progress A PR that is not yet ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants