Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ucp_client_server doesn't seem to work #6244

Open
shuki-zanyovka opened this issue Feb 1, 2021 · 2 comments
Open

ucp_client_server doesn't seem to work #6244

shuki-zanyovka opened this issue Feb 1, 2021 · 2 comments
Labels

Comments

@shuki-zanyovka
Copy link
Contributor

shuki-zanyovka commented Feb 1, 2021

Describe the bug

I built the ucp_client_server as follows from the latest UCX master, and it seems to crash on my VM,
$ examples/ucp_client_server

[localhost:14891:0:14891] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

/home/shukiz/projects/ucx-github/ucx/src/uct/base/uct_cm.c: [ uct_listener_create() ]
      ...
      207                                  uct_listener_h *listener_p)
      208 {
      209     if (!(params->field_mask & UCT_LISTENER_PARAM_FIELD_CONN_REQUEST_CB)) {
==>   210         return UCS_ERR_INVALID_PARAM;
      211     }
      212 
      213     return cm->ops->listener_create(cm, saddr, socklen, params, listener_p);

==== backtrace (tid:  14891) ====
 0 0x0000000000056958 ucs_debug_print_backtrace()  /home/shukiz/projects/ucx-github/ucx/src/ucs/debug/debug.c:656
 1 0x00000000000134e5 uct_listener_create()  /home/shukiz/projects/ucx-github/ucx/src/uct/base/uct_cm.c:210
 2 0x00000000000252e7 ucp_listen()  /home/shukiz/projects/ucx-github/ucx/src/ucp/core/ucp_listener.c:153
 3 0x000000000002568d ucp_listener_create()  /home/shukiz/projects/ucx-github/ucx/src/ucp/core/ucp_listener.c:267
 4 0x00000000004015a3 start_server()  /home/shukiz/projects/ucx-github/ucx/examples/ucp_client_server.c:737
 5 0x00000000004015a3 run_server()  /home/shukiz/projects/ucx-github/ucx/examples/ucp_client_server.c:822
 6 0x00000000004015a3 main()  /home/shukiz/projects/ucx-github/ucx/examples/ucp_client_server.c:969
 7 0x0000000000022555 __libc_start_main()  /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:266
 8 0x00000000004018c4 _start()  ???:0
=================================
Segmentation fault (core dumped)

$ cat /proc/version
Linux version 3.10.0-1127.13.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Jun 23 15:46:38 UTC 2020

Steps to Reproduce

$ git log
commit fce0e47
Merge: 6c3588b f233126
Author: Yossi Itigin [email protected]
Date: Sun Jan 31 12:18:50 2021 +0200

Merge pull request #6237 from dmitrygx/topic/ucp/gtest_ka_fix

GTEST/UCP: Don't ask for RMA feature in KA test

$ ./contrib/configure-prof --with-mlx5-hw CC=mpicc CXX=mpic++ --enable-examples
$ make -j

Setup and versions

$ uname -a
Linux localhost.localdomain 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)

Additional information (depending on the issue)

  • OpenMPI version
  • Output of ucx_info -d to show transports and devices recognized by UCX
  • Configure result - config.log
  • Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data"
@shuki-zanyovka
Copy link
Contributor Author

Here are some more details:
[shukiz@localhost examples]$ gcc ucp_client_server.c -lucp -lucs
[shukiz@localhost examples]$ gdb ./a.out
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/shukiz/projects/ucx-github/ucx/examples/a.out...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/shukiz/projects/ucx-github/ucx/examples/./a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff2865700 (LWP 17209)]

Program received signal SIGSEGV, Segmentation fault.
uct_listener_create (cm=0x0, saddr=0x7fffffffd880, socklen=16, params=0x7fffffffd860, listener_p=0x6413d8) at base/uct_cm.c:214
214 return cm->ops->listener_create(cm, saddr, socklen, params, listener_p);
Missing separate debuginfos, use: debuginfo-install libibcm-41mlnx1-OFED.4.1.0.1.0.47100.x86_64 libibverbs-41mlnx1-OFED.4.7.0.0.2.47100.x86_64 libmlx5-41mlnx1-OFED.4.7.0.3.3.47100.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-41mlnx1-OFED.4.7.0.3.3.47100.x86_64 numactl-libs-2.0.12-5.el7.x86_64
(gdb) bt
#0 uct_listener_create (cm=0x0, saddr=0x7fffffffd880, socklen=16, params=0x7fffffffd860, listener_p=0x6413d8) at base/uct_cm.c:214
#1 0x00007ffff7b6e257 in ucp_listen (listener=listener@entry=0x641310, params=params@entry=0x7fffffffdb10) at core/ucp_listener.c:153
#2 0x00007ffff7b6e5fd in ucp_listener_create (worker=0x616360, params=0x7fffffffdb10, listener_p=0x7fffffffdbf8) at core/ucp_listener.c:267
#3 0x000000000040249a in start_server ()
#4 0x00000000004026fa in run_server ()
#5 0x00000000004029d1 in main ()
(gdb) frame 1
#1 0x00007ffff7b6e257 in ucp_listen (listener=listener@entry=0x641310, params=params@entry=0x7fffffffdb10) at core/ucp_listener.c:153
153 status = uct_listener_create(ucp_cm->cm, addr,
(gdb) list
148
149 listener->listeners = uct_listeners;
150
151 for (i = 0; i < num_cms; ++i) {
152 ucp_cm = &worker->cms[i];
153 status = uct_listener_create(ucp_cm->cm, addr,
154 params->sockaddr.addrlen, &uct_params,
155 &uct_listeners[listener->num_rscs]);
156 if (status != UCS_OK) {
157 ucs_debug("failed to create UCT listener on CM %p (component %s) "
(gdb) print worker
$1 = (struct ucp_worker *) 0x616360
(gdb) print worker->cms[0]
$2 = {cm = 0x612f70, attr = {field_mask = 1, max_conn_priv = 2032}, cmpt_idx = 3 '\003'}
(gdb) print i
$3 = 1 '\001'
(gdb) print worker->cms[1]
$4 = {cm = 0x0, attr = {field_mask = 0, max_conn_priv = 0}, cmpt_idx = 0 '\000'}
(gdb)

@alinask
Copy link
Contributor

alinask commented May 5, 2021

Hi @shuki-zanyovka , can you please check if the proposed environment parameter in #6755 resolves your issue as well?
UCX_SOCKADDR_TLS_PRIORITY=tcp on both sides.
Given that you are using UCX v1.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants