-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize UCX errors on connect()
and correct pytest fixtures
#6434
Generalize UCX errors on connect()
and correct pytest fixtures
#6434
Conversation
Various different errors may happen when trying to connect to a remote endpoint. By generalizing the exception we catch, we leave the responsibility of raising the appropriate error depending on known communication issues to Distributed.
Looks like things timed out on |
rerun tests |
with #6428 in we can rerun and hopefully resolve UCX issues |
rerun tests |
1 similar comment
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Does this change mean we need to bump minimum versions anywhere?
I can reproduce the CI hang locally but only if running the entire gpuCI test set, (i.e., not when running on |
No, it just has been generalized so that the exception is (hopefully) always an |
Unit Test Results 15 files ± 0 15 suites ±0 6h 7m 23s ⏱️ - 26m 20s For more details on these failures and errors, see this check. Results for commit e95fb9f. ± Comparison against base commit d32f4b0. ♻️ This comment has been updated with latest results. |
The fixture was not being properly used, it must be added to each function that requires it as an argument. Fixture was moved to `utils_test.ucx_loop` and the argument added to all pytest functions that do UCX testing.
connect()
connect()
and correct pytest fixtures
The failures here are for known failing test |
To add to @quasiben's comment above, here is a more detailed list of failing tests:
I believe none of them are related to this PR, but please let me know if you think they are related somehow. |
|
Failed tests now are:
I still seems that neither of the errors are related to this PR, but let me know if there's reason to believe otherwise. |
Yeah I agree they are unrelated. Thanks for handling this @pentschev. |
Various different errors may happen when trying to connect to a remote endpoint. By generalizing the exception we catch, we leave the responsibility of raising the appropriate error depending on known communication issues to Distributed.
Additionally moved UCX pytest fixture to
utils_test.ucx_loop
and the argument added to all pytest functions that do UCX testing, which was not being properly used.Closes #6429
pre-commit run --all-files