-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_stress_creation_and_deletion: ValueError: Could not find dependent #5172
Comments
There is a suspicious counter on the worker. if it cannot find a dependency after 5 attempts, it will raise this message. Looking at how the test is written there is nothing protecting us from breaching this threshold and this failure is anticipated in X% of runs. |
This looks a lot like an error I am seeing since updating to 2021.07.0 and is persistent and preventative in 2021.07.1 & .2. The workflow is not complicated - data load, per element math, reduction (mean) and rechunk operation. I'm not deep on dask code but I might take a look at this test to see if it can assist in creating a reproducible. |
@woodcockr If you have a reliable reproducer, that would be extremely helpful. I'll take another look at the test to see if I can find something |
My apologies I still can't get a reliable reproducer that isn't my entire code set. I'll chip away but 2021.07.0 is succeeding more often than not whereas 07.1 and 07.2 just fail to complete. When I have some clear air from current tasks I'll try again. |
I recently updated to 2021.9.1 and see this issue in a run:
I'm running on GKE and searched the logs for the task name and found many, many hits of the form. There is also an interesting assertion failing where the code is expecting "OK" but gets a JSON reply:
|
This test frequently randomly fails on CI with the exception
e.g. https://github.com/dask/distributed/pull/5168/checks?check_run_id=3249880498
The more I look at this test the more I get convinced that the test itself is fine and it's genuinely reporting the one thing that it's supposed to test against - a computation must be resilient to nanny restart - is not OK.
The text was updated successfully, but these errors were encountered: