-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LIFX update handle transient communication failures #90891
Conversation
Hey there @bdraco, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
So the delay I noticed is failing the tests. I'll look into it deeper after the long weekend here in Australia, unless someone else fixes it first. 😉 |
Converting to draft because I may have stumbled upon a significant performance improvement refactor that needs more testing/tests. |
41e8f66
to
88af53c
Compare
4c89314
to
2fcf49a
Compare
2fcf49a
to
aca6817
Compare
(I was waiting for the tests to complete before clicking that button... 😛 ) |
What about this option: async with asyncio_timeout(MESSAGE_TIMEOUT):
while len(self.device.message) > 0:
await asyncio.sleep(0) |
Its still using 100% of the cpu/available event loop run time for until the timeout or the while condition returns False since every time the event loop has free time it's going to do the len check and that return control to the loop runner via the await |
And yet it's recommended by the Python docs: https://docs.python.org/3/library/asyncio-task.html#asyncio.sleep: Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call. The nature of the fail/success cycle is that it appears to rectify itself immediately after the |
These devices sometimes flakey and generate a lot of noise from drop outs since communication is UDP best-effort. We should only mark them unavailable if its not a momentary blip fixes home-assistant#78876
…ing device Signed-off-by: Avi Miller <[email protected]>
Signed-off-by: Avi Miller <[email protected]>
Signed-off-by: Avi Miller <[email protected]>
Signed-off-by: Avi Miller <[email protected]>
Signed-off-by: Avi Miller <[email protected]>
Signed-off-by: Avi Miller <[email protected]>
Includes test that should get coverage of the new code too Signed-off-by: Avi Miller <[email protected]>
There is nothing wrong with running the sleep once or twice. The problem is running it thousands of times or more |
This makes no sense to me: why would relinquishing the event loop consume a CPU? It should have the exact opposite effect of not using any CPU until it actually has stuff to do. I'll try and get the Python profiler thing working again but until then, my subjective experience is that it uses way less CPU becaues it takes exponentially less time overall to finish. I'm talking orders of magnitude faster this way: from 2-3 seconds per bulb per update to ~0.1-0.3 seconds per bulb. |
Signed-off-by: Avi Miller <[email protected]>
c53bf11
to
88ddc83
Compare
If there is nothing else going on and it's waiting for the dict to be empty you get Task runs and gets to Len check returns true Return control to event loop via await Return control to event loop via await Return control to event loop via await Return control to event loop via await Return control to event loop via await ... Return control to event loop via await Task continues on |
I'm not sure I'm getting your point or if you're making mine, but that's what I want to happen. I need "loop tasks run" to happen more often to get the queue to empty in the first place. Overall more stuff should happen more often because the loop has more opportunity to run more things while we wait for both bulb and Are you sure it's not 100% because Home Assistant is just able to do lots of other stuff while it's waiting? :) |
It will do what you want ... but you could end up with thousands of loop runs until the condition returns false which it will do as fast as the system can perform All of those loop runs will never block because the task will always be ready to check the condition again and will consume all available cpu time while it's looping since the len check isn't an asyncio wait primitive |
So what? If there is literally nothing to do except wait for a LIFX bulb to respond, who cares if a CPU is being pegged at 100% while it waits? I'd be willing to bet that most folks are running on low power, low core count devices, so the overall impact will still be a subjective (and objective) improvement in overall performance. |
There are always other things to do. |
Which is exactly why I want to release the event loop. So those things can be done. |
There are also other processes and threads on the system, as well as heat concerns with pegging cpu in one thread. You/we will need to find some other method of resolving the issue. |
Except it doesn't happen: I've been running this way for days and my CPUs are not pegged. In fact, the Home Assistant container appears to be using less CPU overall than before. Either way, this is now your problem not mine. I'll just keep using my working alternative implementation. |
Proposed change
Replaces #90872 from @bdraco with a more thorough refactoring of the update process to remove the unnecessary lock and to allow for up to three timeouts before actually offlining a device.
It also disables polling on all the entities except the Light entity as that update grabs all the required data for all the entities anyway. This makes things significantly faster and doesn't overwhelm the bulbs.
Note that I consider this a code quality improvement rather than a bug fix, though the result is the same.
Type of change
Additional information
Checklist
black --fast homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
..coveragerc
.To help with the load of incoming pull requests: