-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test flakiness investigation and attempted fixes ❄ #3498
Conversation
…e cursor immediately becomes visible. Ensure we turn of cursor blink inside the input suggetions snapshot test.
…ette snapshot test
…ion in snapshot test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-review review. I think you're on the right track. These changes seem to be worthwhile, even if we didn't have the flaky tests issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found a couple of small things that could be improved.
There were also a couple of Return docstrings which I requested you indent. I recall doing this for longer docstrings and I think there was a reason we did it, so I suggested changes in line with that.
There are also a couple of return types AwaitComplete
that are generics but that you didn't fill in and the typecheckers will complain.
@@ -152,7 +154,7 @@ def __init__( | |||
) | |||
self.path = path | |||
|
|||
def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> None: | |||
def _add_to_load_queue(self, node: TreeNode[DirEntry]) -> AwaitComplete: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is missing the return.
If I'm understanding this, the await complete we get is one that waits for the whole load queue to be processed, right?
I don't know how Queue
works, but if you call join
and then add more nodes to the load queue, won't the previously called join
also wait for those nodes to be loaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be on to something - I thought join()
returned a Future. I'll investigate this more.
(I misread)
Yes, if you call join
and then add more nodes to the queue, the join
will wait until the queue is completely empty. An alternative would be to post a "marker" message on to the queue and wait for that to be processed, then stop waiting - maybe that'd be better for since some people might be polling their filesystem for changes faster than the queue is processed.
@@ -172,47 +172,6 @@ async def test_screen_stack_preserved(ModesApp: Type[App]): | |||
await pilot.press("o") | |||
|
|||
|
|||
async def test_inactive_stack_is_alive(): | |||
"""This tests that timers in screens outside the active stack keep going.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to remove this test?
Maybe it wasn't the best test, but I feel like it was testing a relevant thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't really sure how to rewrite this in a way that wouldn't be flaky - open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably with larger time intervals and explicitly pausing between mode switches and the final assert.
We also probably don't need to switch twice.
So, start in a mode that sets a timer to append something to a list or whatever.
Switch to another mode that does nothing.
Wait for long enough and make sure the initial list append happened.
I'll review this after it has been Rodrigoed. |
Co-authored-by: Rodrigo Girão Serrão <[email protected]>
Co-authored-by: Rodrigo Girão Serrão <[email protected]>
Co-authored-by: Rodrigo Girão Serrão <[email protected]>
@willmcgugan Ready whenever. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Some very satisfying changes here (removing all those pauses). Some questions and suggestions.
…table cursor into view
Signed-off-by: Darren Burns <[email protected]>
Signed-off-by: Darren Burns <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things to consider, but LGTM
ReturnType = TypeVar("ReturnType") | ||
|
||
|
||
@rich.repr.auto(angular=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be helpful if the repr included the coroutines.
Maybe see what it generates. If it looks like noise, we can leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it already include the coroutines, since the parameter is named the same as the attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. What a nice feature. I must thank the dev who implemented that.
Curious what the repr will look like.
…it's a variable length param.
Here are my theories and the corresponding changes for each of the flaky tests identified by Rodrigo here: #3484 (comment).
Work-in-progress, but feel free to comment/discuss the theories.
✅ = I think I've fixed it
test_schedule_reverse_animations
✅This test essentially was starting two timers one line after the other, and assuming the order in which the timers finished was guaranteed. This isn't the case.
Solution: I've modified the test to make it almost certain that the reverse animation begins to run while the forward animation is in progress.
test_scheduling_animation
✅I think that the
pause
was actually detrimental here - the delay0.1
and0.9 * 0.1
are too close to each other, and so sometimes the execution didn't continue after thepause
until the animation was already complete.Solution: I opted to remove the
pause
here rather than reduce the0.9
value, as I don't think the pause adds much.test_inactive_stack_is_alive
✅switch_mode
leads to important asynchronous (queue-based) work being done, but it cannot be awaited.When you perform any action after switching modes, you cannot be sure what state the application is in - you don't know, for example, if the root screen associated with the mode has been mounted yet.
Solution: I've made
switch_mode
return anAwaitMount
, and updated the test toawait
it.Edit:
Turns out the problem was simpler - the test was querying for a
Label
in the DOM every 0.1 seconds. Since the app was switching screens, there were small windows where there was noLabel
in the DOM, and this error wasn't accounted for.I opted to remove this test because I think the approach it took was generally flawed and I don't think it gave much value.
test_remove_tabs
The problem is that the
AwaitRemove
returned byremove_tab
only waits for the removal of theTab
from the DOM. The addition of the.-active
class happens some time in the future. This ultimately means that we cannot safely useTabs.active_tab
in an application that supports tab removal.Solution: I've added a more general object for optional awaits called
AwaitComplete
.remove_tab
now returns anAwaitComplete
which waits for both the DOM node being removed, and for the corresponding internal state updates.test_remove_tabs_messages
This looks like it was closely related to the issue above.
remove_tab
was being called in a loop, which was ultimately resulting in many messages being dispatched after refresh.Solution: Same as above. Also, the messages are no longer dispatched after refresh. I tested an example to see if this reintroduced an animation issue with the underline bar, but couldn't see any problems.
There's still some flakiness here I can't work out because it's rare:
It looks like _on_mount can, very occasionally, still be called before its children can be queried. I can’t see any other explanation after tracing through code as much as possible. I got 2 failures in a row within Tabs where the child Underline wasn’t being returned from a query inside Tabs._on_mount.
test_textual_dev_border_preview
✅Looks like this may have been due to the button press animation.
The test code uses
wait_for_scheduled_animations
, but theButton
widget doesn't use animations for the active/click/press effect.test_command_palette
✅This failure happened during a period where snapshot reports weren't being produced in CI, so it's not possible to see why this failure occurred.
There were 2 issues here:
Input
widget cursor blinking in theCommandPalette
- all other snapshot tests which use anInput
setcursor_blink = False
.cursor_blink
reactive didn't do anything when changed at runtime. It could only be set before the widget was actually mounted.Solution: I've switched off the blinking cursor for this test. I've fixed the
Input
widget such that it handles toggling thecursor_blink
value at runtime.test_input_suggestions
✅This failed during investigating the above issue - very similar to the above failure.
Solution: I've switched off the blinking cursor for this test. I also added a new watcher to
Input
which ensures that as soon as you set thecursor_blink
toFalse
, the cursor becomes visible again.test_directory_tree_reload_other_node
✅This is a problem that affects many
DirectoryTree
tests.Loading and reloading of directories are done inside a worker. The test doesn't know when this worker is complete - yet it's not safe to use the
DirectoryTree
unless you know the worker is complete.Solution: I've made the processing of the queue which performs the "reload" awaitable.
test_app_with_notifications_that_expire
✅This test used
time.sleep
instead ofasyncio.sleep
which blocked the event loop and likely contributed to flakiness - giving less time for notification timers to expire the notifications correctly.Solution: I've reworked the timeouts used in the test so the notifications will time out much faster, and we'll wait for a long time (relative to their timeout) for them to be expired. Also switched to
asyncio.sleep
.test_loading
✅There's a known issue where
render
can fire beforeon_mount
. This means if you're initialising state insideon_mount
instead of__init__
, and yourrender
method uses that state, there may be a crash.This caused a crash in the test for
LoadingIndicator
.Solution: The
LoadingIndicator
now initialises the state inside the constructor instead of inon_mount
. This doesn't solve the root problem, but avoids a crash which can occur in this test (and presumably also in any real applications which useLoadingIndictor
).Other issues
Tabs.active
wasn't being re-assigned when tabs get removed or cleared (fixesTabs
still has active tab after cleared #3523).cursor_blink
reactive onInput
didn't do anything when changed at runtime.Fixes #3484