Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP: handle a case of a null cm on the worker - v1.10.x #6765

Merged

Conversation

alinask
Copy link
Contributor

@alinask alinask commented May 6, 2021

What

Handle a case of a null cm on the worker.

Why ?

This may happen if the list of components that support CM is longer than the
available cms on the host (worker->cms).

Needed to fix #6755
Backport of #6759

How ?

Skip a null cm on the worker.

@alinask alinask requested a review from yosefe May 6, 2021 11:03
@yosefe yosefe changed the title UCP: handle a case of a null cm on the worker. UCP: handle a case of a null cm on the worker - v1.10.x May 6, 2021
@yosefe
Copy link
Contributor

yosefe commented May 6, 2021

@alinask can you pls add NEWS entry?

NEWS Outdated
@@ -17,6 +17,7 @@
* Fixes in RPM dependency on libibverbs
* Fixes in ABI backward compatibility for active message protocol
* Add support for DC full-handshake mode (off by default).
* Fixes for handling a NULL cm on the ucp worker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes in TCP connection establishment (issue #6755)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tcp..? the null cm can be any cm in the array (most likely rdmacm though) but it's ucp layer...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we describe the error in more high-level in terms of how it affects a user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes for handling a missing sockaddr transport on a host ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes for connection establishment protocol (tcpcm, rdmacm, etc.) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Fixes for segmentation fault while listening for connections

@alinask alinask force-pushed the topic/topic/ucp-fix-access-null-cm-v1-10 branch from 55df280 to 45400db Compare May 6, 2021 13:39
@alinask
Copy link
Contributor Author

alinask commented May 6, 2021

failure on /labhome/swx-azure-svc/workspace/azure/io_demo_sputnik/1/s/buildlib/../test/apps/iodemo/io_demo: No such file or directory

@yosefe
Copy link
Contributor

yosefe commented May 7, 2021

bot:pipe:retest

1 similar comment
@yosefe
Copy link
Contributor

yosefe commented May 8, 2021

bot:pipe:retest

@yosefe
Copy link
Contributor

yosefe commented May 9, 2021

@alinask can you pls squash?

may happen if the list of components that support CM is longer than the
available cms on the host (worker->cms).
@alinask alinask force-pushed the topic/topic/ucp-fix-access-null-cm-v1-10 branch from 8a69295 to 3a307d4 Compare May 9, 2021 08:50
@yosefe yosefe merged commit f633e85 into openucx:v1.10.x May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants