Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP server callback is not invoked upon a client connection #8398

Closed
rrgargeya opened this issue Jul 22, 2022 · 22 comments
Closed

UCP server callback is not invoked upon a client connection #8398

rrgargeya opened this issue Jul 22, 2022 · 22 comments
Labels

Comments

@rrgargeya
Copy link

Describe the bug

We are trying to use UCP to implement clients and servers.
This is replace our Verbs based implementation where clients and servers are connected via RC QPs.

I have an issue in that the server's callback is not getting called when the client tries to connect.

The error callback on the client is not getting called (properly perhaps as there was no error).

These are IB interfaces, and I see in the trace that a TCP listener is created, and not a RDMACM listener.

Is that the problem?

I see that
UCX_SOCKADDR_TLS_PRIORITY=rdmacm,tcp,sockcm

So, I am assuming rdmacm must take precedence before tcp.

Server's listener creation code snippet:
params.field_mask = UCP_LISTENER_PARAM_FIELD_SOCK_ADDR |
UCP_LISTENER_PARAM_FIELD_CONN_HANDLER;
params.sockaddr.addr = (const struct sockaddr*)&server_addr;
params.sockaddr.addrlen = sizeof(struct sockaddr);
params.conn_handler.cb = server_conn_handle_cb;
params.conn_handler.arg = nullptr;
status = ucp_listener_create(ucp_worker, &params, &ucp_listener_);

Client's endpoint creation code snippet:
ep_params.field_mask = UCP_EP_PARAM_FIELD_FLAGS |
UCP_EP_PARAM_FIELD_SOCK_ADDR |
UCP_EP_PARAM_FIELD_USER_DATA |
UCP_EP_PARAM_FIELD_ERR_HANDLER |
UCP_EP_PARAM_FIELD_ERR_HANDLING_MODE;
ep_params.err_mode = UCP_ERR_HANDLING_MODE_PEER;
ep_params.err_handler.cb = client_err_cb;
ep_params.err_handler.arg = nullptr;
ep_params.user_data = &connect_user_data;
ep_params.flags = UCP_EP_PARAMS_FLAGS_CLIENT_SERVER |
UCP_EP_PARAMS_FLAGS_SEND_CLIENT_ID;
ep_params.sockaddr.addr = (struct sockaddr*) &connect_addr;
ep_params.sockaddr.addrlen = sizeof(connect_addr);

status = ucp_ep_create(ucp_worker, &ep_params, &ucp_client_ep);

Server connection handler prototype:
void static server_conn_handle_cb(ucp_conn_request_h conn_request,
void *arg);

I have the server in its own thread.

I experimented with the clients in one thread, and later with the clients each in their own thread.

That did not make a difference.

Steps to Reproduce

  • Command line
  • A client server program started on two hosts, hostA and hostB
  • UCX version used (from github branch XX or release YY) + UCX configure flags (can be checked by ucx_info -v)

$ ucx_info -v

UCT version=1.12.1 revision dc92435

configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check

  • Any UCX environment variables used
    UCX_LOG_LEVEL=diag

Setup and versions

  • OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
    • cat /etc/issue or cat /etc/redhat-release + uname -a

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
$ uname -a
Linux hostA 3.10.0-693.17.1.rt56.636.el7.x86_64 #1 SMP PREEMPT RT Tue Jan 16 16:25:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • For Nvidia Bluefield SmartNIC include cat /etc/mlnx-release (the string identifies software and firmware setup)
  • For RDMA/IB/RoCE related issues:
    • Driver version:
      • rpm -q rdma-core or rpm -q libibverbs
      • or: MLNX_OFED version ofed_info -s

$ ofed_info -s
MLNX_OFED_LINUX-4.2-1.2.0.0:

The test nodes have four HCAs, two with IB interfaces, and two with RoCE interfaces.
For this test, we are using ib0 (mlx5_0:1) an IB interface.

$ lspci|grep Mellanox
3b:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
3b:00.1 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
d8:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
d8:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

  • HW information from ibstat or ibv_devinfo -vv command

$ ibv_devinfo -vv
hca_id: mlx5_3
transport: InfiniBand (0)
fw_ver: 16.21.2010
node_guid: ec0d:9a03:0043:15dd
sys_image_guid: ec0d:9a03:0043:15dc
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xe5721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0xe5620000
device_cap_exp_flags: 0x500DF8F000000000
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
EXP_ODP
EXP_RX_CSUM_TCP_UDP_PKT
EXP_RX_CSUM_IP_PKT
EXP_MASKED_ATOMICS
EXP_RX_TCP_UDP_PKT_TYPE
EXP_SCATTER_FCS
EXP_WQ_DELAY_DROP
EXP_PHYSICAL_RANGE_MR
Unknown flags: 0x200000000000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
log atomic arg sizes (mask) 0x8
masked_log_atomic_arg_sizes (mask) 0x3c
masked_log_atomic_arg_sizes_network_endianness (mask) 0x34
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 78125
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
max_size: 0xFFFFFFFFFFFFFFFF
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
dc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 0
max_device_ctx: 1020
Multi-Packet RQ supported
Supported for QP types: RAW_PACKET
Supported payload shifts:
2 bytes
Log number of strides for single WQE: 9 - 16
Log number of bytes in single stride: 6 - 13

    VLAN offloads caps:
                                    C-VLAN stripping offload
                                    C-VLAN insertion offload
    rx_pad_end_addr_align:  64
    tso_caps:
    max_tso:                        262144
    supported_qp:
                                    SUPPORT_RAW_PACKET
    packet_pacing_caps:
    qp_rate_limit_min:              0kbps
    qp_rate_limit_max:              0kbps
    ooo_caps:
    ooo_rc_caps  = 0x1
    ooo_xrc_caps = 0x1
    ooo_dc_caps  = 0x1
    ooo_ud_caps  = 0x0
                                    SUPPORT_RC_RW_DATA_PLACEMENT
                                    SUPPORT_XRC_RW_DATA_PLACEMENT
                                    SUPPORT_DC_RW_DATA_PLACEMENT
    sw_parsing_caps:
                                    SW_PARSING
                                    SW_PARSING_CSUM
                                    SW_PARSING_LSO
    supported_qp:
                                    SUPPORT_RAW_PACKET
    tag matching not supported
    tunnel_offloads_caps:
                                    TUNNEL_OFFLOADS_VXLAN
                                    TUNNEL_OFFLOADS_GRE
                                    TUNNEL_OFFLOADS_GENEVE
    Device ports:
            port:   1
                    state:                  PORT_ACTIVE (4)
                    max_mtu:                4096 (5)
                    active_mtu:             1024 (3)
                    sm_lid:                 0
                    port_lid:               0
                    port_lmc:               0x00
                    link_layer:             Ethernet
                    max_msg_sz:             0x40000000
                    port_cap_flags:         0x04010000
                    max_vl_num:             invalid value (0)
                    bad_pkey_cntr:          0x0
                    qkey_viol_cntr:         0x0
                    sm_sl:                  0
                    pkey_tbl_len:           1
                    gid_tbl_len:            256
                    subnet_timeout:         0
                    init_type_reply:        0
                    active_width:           4X (2)
                    active_speed:           25.0 Gbps (32)
                    phys_state:             LINK_UP (5)
                    GID[  0]:               fe80:0000:0000:0000:ee0d:9aff:fe43:15dd
                    GID[  1]:               fe80:0000:0000:0000:ee0d:9aff:fe43:15dd
                    GID[  2]:               0000:0000:0000:0000:0000:ffff:0ac8:c942
                    GID[  3]:               0000:0000:0000:0000:0000:ffff:0ac8:c942

hca_id: mlx5_2
transport: InfiniBand (0)
fw_ver: 16.21.2010
node_guid: ec0d:9a03:0043:15dc
sys_image_guid: ec0d:9a03:0043:15dc
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xe5721c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0xe5620000
device_cap_exp_flags: 0x500DF8F000000000
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
EXP_ODP
EXP_RX_CSUM_TCP_UDP_PKT
EXP_RX_CSUM_IP_PKT
EXP_MASKED_ATOMICS
EXP_RX_TCP_UDP_PKT_TYPE
EXP_SCATTER_FCS
EXP_WQ_DELAY_DROP
EXP_PHYSICAL_RANGE_MR
Unknown flags: 0x200000000000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
log atomic arg sizes (mask) 0x8
masked_log_atomic_arg_sizes (mask) 0x3c
masked_log_atomic_arg_sizes_network_endianness (mask) 0x34
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 78125
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
max_size: 0xFFFFFFFFFFFFFFFF
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
dc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 0
max_device_ctx: 1020
Multi-Packet RQ supported
Supported for QP types: RAW_PACKET
Supported payload shifts:
2 bytes
Log number of strides for single WQE: 9 - 16
Log number of bytes in single stride: 6 - 13

    VLAN offloads caps:
                                    C-VLAN stripping offload
                                    C-VLAN insertion offload
    rx_pad_end_addr_align:  64
    tso_caps:
    max_tso:                        262144
    supported_qp:
                                    SUPPORT_RAW_PACKET
    packet_pacing_caps:
    qp_rate_limit_min:              0kbps
    qp_rate_limit_max:              0kbps
    ooo_caps:
    ooo_rc_caps  = 0x1
    ooo_xrc_caps = 0x1
    ooo_dc_caps  = 0x1
    ooo_ud_caps  = 0x0
                                    SUPPORT_RC_RW_DATA_PLACEMENT
                                    SUPPORT_XRC_RW_DATA_PLACEMENT
                                    SUPPORT_DC_RW_DATA_PLACEMENT
    sw_parsing_caps:
                                    SW_PARSING
                                    SW_PARSING_CSUM
                                    SW_PARSING_LSO
    supported_qp:
                                    SUPPORT_RAW_PACKET
    tag matching not supported
    tunnel_offloads_caps:
                                    TUNNEL_OFFLOADS_VXLAN
                                    TUNNEL_OFFLOADS_GRE
                                    TUNNEL_OFFLOADS_GENEVE
    Device ports:
            port:   1
                    state:                  PORT_ACTIVE (4)
                    max_mtu:                4096 (5)
                    active_mtu:             1024 (3)
                    sm_lid:                 0
                    port_lid:               0
                    port_lmc:               0x00
                    link_layer:             Ethernet
                    max_msg_sz:             0x40000000
                    port_cap_flags:         0x04010000
                    max_vl_num:             invalid value (0)
                    bad_pkey_cntr:          0x0
                    qkey_viol_cntr:         0x0
                    sm_sl:                  0
                    pkey_tbl_len:           1
                    gid_tbl_len:            256
                    subnet_timeout:         0
                    init_type_reply:        0
                    active_width:           4X (2)
                    active_speed:           25.0 Gbps (32)
                    phys_state:             LINK_UP (5)
                    GID[  0]:               fe80:0000:0000:0000:ee0d:9aff:fe43:15dc
                    GID[  1]:               fe80:0000:0000:0000:ee0d:9aff:fe43:15dc
                    GID[  2]:               0000:0000:0000:0000:0000:ffff:0ac8:c842
                    GID[  3]:               0000:0000:0000:0000:0000:ffff:0ac8:c842

hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.21.2010
node_guid: ec0d:9a03:0043:1765
sys_image_guid: ec0d:9a03:0043:1764
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xe17e1c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0xe16e0000
device_cap_exp_flags: 0x5048F8F100000000
EXP_DC_TRANSPORT
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
EXP_ODP
EXP_RX_CSUM_TCP_UDP_PKT
EXP_RX_CSUM_IP_PKT
EXP_DC_INFO
EXP_MASKED_ATOMICS
EXP_RX_TCP_UDP_PKT_TYPE
EXP_PHYSICAL_RANGE_MR
Unknown flags: 0x200000000000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
log atomic arg sizes (mask) 0x8
masked_log_atomic_arg_sizes (mask) 0x3c
masked_log_atomic_arg_sizes_network_endianness (mask) 0x34
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 78125
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
max_size: 0xFFFFFFFFFFFFFFFF
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
dc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 262144
max_device_ctx: 1020
Multi-Packet RQ supported
Supported for QP types: RAW_PACKET
Supported payload shifts:
2 bytes
Log number of strides for single WQE: 9 - 16
Log number of bytes in single stride: 6 - 13
rx_pad_end_addr_align: 64
tso_caps:
max_tso: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
ooo_caps:
ooo_rc_caps = 0x1
ooo_xrc_caps = 0x1
ooo_dc_caps = 0x1
ooo_ud_caps = 0x0
SUPPORT_RC_RW_DATA_PLACEMENT
SUPPORT_XRC_RW_DATA_PLACEMENT
SUPPORT_DC_RW_DATA_PLACEMENT
sw_parsing_caps:
supported_qp:
max_rndv_hdr_size: 0x40
max_num_tags: 0x7f
max_ops: 0x8000
max_sge: 0x1
capability_flags:
IBV_EXP_TM_CAP_RC
IBV_EXP_TM_CAP_DC
tunnel_offloads_caps:
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 103
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x2651e848
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 8
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:ec0d:9a03:0043:1765

hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.21.2010
node_guid: ec0d:9a03:0043:1764
sys_image_guid: ec0d:9a03:0043:1764
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffffffffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0xe17e1c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0xe16e0000
device_cap_exp_flags: 0x5048F8F100000000
EXP_DC_TRANSPORT
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
EXP_ODP
EXP_RX_CSUM_TCP_UDP_PKT
EXP_RX_CSUM_IP_PKT
EXP_DC_INFO
EXP_MASKED_ATOMICS
EXP_RX_TCP_UDP_PKT_TYPE
EXP_PHYSICAL_RANGE_MR
Unknown flags: 0x200000000000
max_sge: 30
max_sge_rd: 30
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
log atomic arg sizes (mask) 0x8
masked_log_atomic_arg_sizes (mask) 0x3c
masked_log_atomic_arg_sizes_network_endianness (mask) 0x34
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 16777216
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 240
max_total_mcast_qp_attach: 503316480
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 78125
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
ODP_SUPPORT
ODP_SUPPORT_IMPLICIT
max_size: 0xFFFFFFFFFFFFFFFF
rc_odp_caps:
SUPPORT_SEND
SUPPORT_RECV
SUPPORT_WRITE
SUPPORT_READ
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
SUPPORT_SEND
dc_odp_caps:
SUPPORT_SEND
SUPPORT_WRITE
SUPPORT_READ
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 262144
max_device_ctx: 1020
Multi-Packet RQ supported
Supported for QP types: RAW_PACKET
Supported payload shifts:
2 bytes
Log number of strides for single WQE: 9 - 16
Log number of bytes in single stride: 6 - 13
rx_pad_end_addr_align: 64
tso_caps:
max_tso: 0
packet_pacing_caps:
qp_rate_limit_min: 0kbps
qp_rate_limit_max: 0kbps
ooo_caps:
ooo_rc_caps = 0x1
ooo_xrc_caps = 0x1
ooo_dc_caps = 0x1
ooo_ud_caps = 0x0
SUPPORT_RC_RW_DATA_PLACEMENT
SUPPORT_XRC_RW_DATA_PLACEMENT
SUPPORT_DC_RW_DATA_PLACEMENT
sw_parsing_caps:
supported_qp:
max_rndv_hdr_size: 0x40
max_num_tags: 0x7f
max_ops: 0x8000
max_sge: 0x1
capability_flags:
IBV_EXP_TM_CAP_RC
IBV_EXP_TM_CAP_DC
tunnel_offloads_caps:
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 15
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x2651e848
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 8
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:ec0d:9a03:0043:1764

  • For GPU related issues:
    • GPU type
    • Cuda:
      • Drivers version
      • Check if peer-direct is loaded: lsmod|grep nv_peer_mem and/or gdrcopy: lsmod|grep gdrdrv

Additional information (depending on the issue)

  • OpenMPI version

  • Output of ucx_info -d to show transports and devices recognized by UCX

  • Configure result - config.log

  • Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data"
    Publisher.ucx.docx
    Subscriber.ucx.docx

@rrgargeya rrgargeya added the Bug label Jul 22, 2022
@rrgargeya
Copy link
Author

$ ucx_info -d

Memory domain: posix

Component: posix

allocate: <= 97650420K

remote key: 24 bytes

rkey_ptr is supported

Transport: posix

Device: memory

Type: intra-node

System device:

capabilities:

bandwidth: 0.00/ppn + 12179.00 MB/sec

latency: 80 nsec

overhead: 10 nsec

put_short: <= 4294967295

put_bcopy: unlimited

get_bcopy: unlimited

am_short: <= 100

am_bcopy: <= 8256

domain: cpu

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to iface

device priority: 0

device num paths: 1

max eps: inf

device address: 8 bytes

iface address: 8 bytes

error handling: ep_check

Memory domain: sysv

Component: sysv

allocate: unlimited

remote key: 12 bytes

rkey_ptr is supported

Transport: sysv

Device: memory

Type: intra-node

System device:

capabilities:

bandwidth: 0.00/ppn + 12179.00 MB/sec

latency: 80 nsec

overhead: 10 nsec

put_short: <= 4294967295

put_bcopy: unlimited

get_bcopy: unlimited

am_short: <= 100

am_bcopy: <= 8256

domain: cpu

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to iface

device priority: 0

device num paths: 1

max eps: inf

device address: 8 bytes

iface address: 8 bytes

error handling: ep_check

Memory domain: self

Component: self

register: unlimited, cost: 0 nsec

remote key: 0 bytes

Transport: self

Device: memory0

Type: loopback

System device:

capabilities:

bandwidth: 0.00/ppn + 6911.00 MB/sec

latency: 0 nsec

overhead: 10 nsec

put_short: <= 4294967295

put_bcopy: unlimited

get_bcopy: unlimited

am_short: <= 8K

am_bcopy: <= 8K

domain: cpu

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to iface

device priority: 0

device num paths: 1

max eps: inf

device address: 0 bytes

iface address: 8 bytes

error handling: ep_check

Memory domain: tcp

Component: tcp

register: unlimited, cost: 0 nsec

remote key: 0 bytes

Transport: tcp

Device: lo

Type: network

System device:

capabilities:

bandwidth: 11.91/ppn + 0.00 MB/sec

latency: 10960 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 18 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Transport: tcp

Device: ib0

Type: network

System device:

capabilities:

bandwidth: 11142.51/ppn + 0.00 MB/sec

latency: 5206 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 6 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Transport: tcp

Device: ib1

Type: network

System device:

capabilities:

bandwidth: 11142.51/ppn + 0.00 MB/sec

latency: 5206 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 6 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Transport: tcp

Device: net4

Type: network

System device:

capabilities:

bandwidth: 11316.36/ppn + 0.00 MB/sec

latency: 5206 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 6 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Transport: tcp

Device: net5

Type: network

System device:

capabilities:

bandwidth: 11316.36/ppn + 0.00 MB/sec

latency: 5206 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 6 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Transport: tcp

Device: net0

Type: network

System device:

capabilities:

bandwidth: 113.16/ppn + 0.00 MB/sec

latency: 5776 nsec

overhead: 50000 nsec

put_zcopy: <= 18446744073709551590, up to 6 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 0

am_short: <= 8K

am_bcopy: <= 8K

am_zcopy: <= 64K, up to 6 iov

am_opt_zcopy_align: <= 1

am_align_mtu: <= 0

am header: <= 8037

connection: to ep, to iface

device priority: 1

device num paths: 1

max eps: 256

device address: 6 bytes

iface address: 2 bytes

ep address: 10 bytes

error handling: peer failure, ep_check, keepalive

Connection manager: tcp

max_conn_priv: 2064 bytes

Memory domain: mlx5_3

Component: ib

register: unlimited, cost: 180 nsec

remote key: 8 bytes

local memory handle is required for zcopy

Transport: rc_verbs

Device: mlx5_3:1

Type: network

System device: mlx5_3 (0)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 800 + 1.000 * N nsec

overhead: 75 nsec

put_short: <= 124

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 8 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 1K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 8 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 1K

am_short: <= 123

am_bcopy: <= 8255

am_zcopy: <= 8255, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 127

domain: device

atomic_add: 64 bit

atomic_fadd: 64 bit

atomic_cswap: 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 18 bytes

ep address: 5 bytes

error handling: peer failure, ep_check

Transport: rc_mlx5

Device: mlx5_3:1

Type: network

System device: mlx5_3 (0)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 800 + 1.000 * N nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 14 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 1K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 14 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 1K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 186

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 18 bytes

ep address: 7 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: ud_verbs

Device: mlx5_3:1

Type: network

System device: mlx5_3 (0)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 830 nsec

overhead: 105 nsec

am_short: <= 116

am_bcopy: <= 1016

am_zcopy: <= 1016, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 880

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 18 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Transport: ud_mlx5

Device: mlx5_3:1

Type: network

System device: mlx5_3 (0)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 830 nsec

overhead: 80 nsec

am_short: <= 180

am_bcopy: <= 1016

am_zcopy: <= 1016, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 132

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 18 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Memory domain: mlx5_2

Component: ib

register: unlimited, cost: 180 nsec

remote key: 8 bytes

local memory handle is required for zcopy

Transport: rc_verbs

Device: mlx5_2:1

Type: network

System device: mlx5_2 (1)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 800 + 1.000 * N nsec

overhead: 75 nsec

put_short: <= 124

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 8 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 1K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 8 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 1K

am_short: <= 123

am_bcopy: <= 8255

am_zcopy: <= 8255, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 127

domain: device

atomic_add: 64 bit

atomic_fadd: 64 bit

atomic_cswap: 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 18 bytes

ep address: 5 bytes

error handling: peer failure, ep_check

Transport: rc_mlx5

Device: mlx5_2:1

Type: network

System device: mlx5_2 (1)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 800 + 1.000 * N nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 14 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 1K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 14 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 1K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 186

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 18 bytes

ep address: 7 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: ud_verbs

Device: mlx5_2:1

Type: network

System device: mlx5_2 (1)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 830 nsec

overhead: 105 nsec

am_short: <= 116

am_bcopy: <= 1016

am_zcopy: <= 1016, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 880

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 18 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Transport: ud_mlx5

Device: mlx5_2:1

Type: network

System device: mlx5_2 (1)

capabilities:

bandwidth: 10957.84/ppn + 0.00 MB/sec

latency: 830 nsec

overhead: 80 nsec

am_short: <= 180

am_bcopy: <= 1016

am_zcopy: <= 1016, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 1K

am header: <= 132

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 18 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Memory domain: mlx5_1

Component: ib

register: unlimited, cost: 180 nsec

remote key: 8 bytes

local memory handle is required for zcopy

Transport: rc_verbs

Device: mlx5_1:1

Type: network

System device: mlx5_1 (2)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 600 + 1.000 * N nsec

overhead: 75 nsec

put_short: <= 124

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 8 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 8 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 123

am_bcopy: <= 8255

am_zcopy: <= 8255, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 127

domain: device

atomic_add: 64 bit

atomic_fadd: 64 bit

atomic_cswap: 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 3 bytes

ep address: 5 bytes

error handling: peer failure, ep_check

Transport: rc_mlx5

Device: mlx5_1:1

Type: network

System device: mlx5_1 (2)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 600 + 1.000 * N nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 14 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 14 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 186

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 3 bytes

ep address: 7 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: dc_mlx5

Device: mlx5_1:1

Type: network

System device: mlx5_1 (2)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 660 nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 11 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 11 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 138

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 5 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: ud_verbs

Device: mlx5_1:1

Type: network

System device: mlx5_1 (2)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 630 nsec

overhead: 105 nsec

am_short: <= 116

am_bcopy: <= 4088

am_zcopy: <= 4088, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 3952

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Transport: ud_mlx5

Device: mlx5_1:1

Type: network

System device: mlx5_1 (2)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 630 nsec

overhead: 80 nsec

am_short: <= 180

am_bcopy: <= 4088

am_zcopy: <= 4088, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 132

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Memory domain: mlx5_0

Component: ib

register: unlimited, cost: 180 nsec

remote key: 8 bytes

local memory handle is required for zcopy

Transport: rc_verbs

Device: mlx5_0:1

Type: network

System device: mlx5_0 (3)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 600 + 1.000 * N nsec

overhead: 75 nsec

put_short: <= 124

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 8 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 8 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 123

am_bcopy: <= 8255

am_zcopy: <= 8255, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 127

domain: device

atomic_add: 64 bit

atomic_fadd: 64 bit

atomic_cswap: 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 3 bytes

ep address: 5 bytes

error handling: peer failure, ep_check

Transport: rc_mlx5

Device: mlx5_0:1

Type: network

System device: mlx5_0 (3)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 600 + 1.000 * N nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 14 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 14 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 186

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to ep

device priority: 38

device num paths: 1

max eps: 256

device address: 3 bytes

ep address: 7 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: dc_mlx5

Device: mlx5_0:1

Type: network

System device: mlx5_0 (3)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 660 nsec

overhead: 40 nsec

put_short: <= 2K

put_bcopy: <= 8256

put_zcopy: <= 1G, up to 11 iov

put_opt_zcopy_align: <= 512

put_align_mtu: <= 4K

get_bcopy: <= 8256

get_zcopy: 65..1G, up to 11 iov

get_opt_zcopy_align: <= 512

get_align_mtu: <= 4K

am_short: <= 2046

am_bcopy: <= 8254

am_zcopy: <= 8254, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 138

domain: device

atomic_add: 32, 64 bit

atomic_and: 32, 64 bit

atomic_or: 32, 64 bit

atomic_xor: 32, 64 bit

atomic_fadd: 32, 64 bit

atomic_fand: 32, 64 bit

atomic_for: 32, 64 bit

atomic_fxor: 32, 64 bit

atomic_swap: 32, 64 bit

atomic_cswap: 32, 64 bit

connection: to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 5 bytes

error handling: buffer (zcopy), remote access, peer failure, ep_check

Transport: ud_verbs

Device: mlx5_0:1

Type: network

System device: mlx5_0 (3)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 630 nsec

overhead: 105 nsec

am_short: <= 116

am_bcopy: <= 4088

am_zcopy: <= 4088, up to 7 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 3952

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Transport: ud_mlx5

Device: mlx5_0:1

Type: network

System device: mlx5_0 (3)

capabilities:

bandwidth: 11794.23/ppn + 0.00 MB/sec

latency: 630 nsec

overhead: 80 nsec

am_short: <= 180

am_bcopy: <= 4088

am_zcopy: <= 4088, up to 3 iov

am_opt_zcopy_align: <= 512

am_align_mtu: <= 4K

am header: <= 132

connection: to ep, to iface

device priority: 38

device num paths: 1

max eps: inf

device address: 3 bytes

iface address: 3 bytes

ep address: 6 bytes

error handling: peer failure, ep_check

Memory domain: cma

Component: cma

register: unlimited, cost: 9 nsec

Transport: cma

Device: memory

Type: intra-node

System device:

capabilities:

bandwidth: 0.00/ppn + 11145.00 MB/sec

latency: 80 nsec

overhead: 2000 nsec

put_zcopy: unlimited, up to 16 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 1

get_zcopy: unlimited, up to 16 iov

get_opt_zcopy_align: <= 1

get_align_mtu: <= 1

connection: to iface

device priority: 0

device num paths: 1

max eps: inf

device address: 8 bytes

iface address: 4 bytes

error handling: peer failure, ep_check

Memory domain: knem

Component: knem

register: unlimited, cost: 18446744073709551616000000000 nsec

remote key: 16 bytes

Transport: knem

Device: memory

Type: intra-node

System device:

capabilities:

bandwidth: 13862.00/ppn + 0.00 MB/sec

latency: 80 nsec

overhead: 2000 nsec

put_zcopy: unlimited, up to 16 iov

put_opt_zcopy_align: <= 1

put_align_mtu: <= 1

get_zcopy: unlimited, up to 16 iov

get_opt_zcopy_align: <= 1

get_align_mtu: <= 1

connection: to iface

device priority: 0

device num paths: 1

max eps: inf

device address: 8 bytes

iface address: 0 bytes

error handling: none

@evgeny-leksikov
Copy link
Contributor

@rrgargeya could you please share config.log which is result of configure command? It seems like rdmacm is disabled in configure/compile time by some reason.

@rrgargeya
Copy link
Author

We are gathering the 'config.log' file. In the meantime, here is the information from 'ucx_info' which shows how ucx was configured.

$ ucx_info -v

UCT version=1.12.1 revision dc92435

configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check

@evgeny-leksikov
Copy link
Contributor

here is the information from 'ucx_info' which shows how ucx was configured.

Need to understand why rdmacm listerner wasn't even tried to start, I think it was disabled by configure and the reason should be clear from config.log.

@rrgargeya
Copy link
Author

We do have a rdmacm issue that can be seen in the config.log. Please see if any of the other errors need our attention. Is there a minimum OFED version requirement? We have MLNX_OFED_LINUX-4.2-1.2.0.0: on our nodes.

ucx.1.12.1.config.log

@yosefe
Copy link
Contributor

yosefe commented Jul 26, 2022

@rrgargeya UCX currently requires MLNX_OFED 5.0 or later.

@evgeny-leksikov
Copy link
Contributor

UCX currently requires MLNX_OFED 5.0 or later.

@yosefe maybe we should document it? For example, here

@shamisp
Copy link
Contributor

shamisp commented Jul 26, 2022

@yosefe @evgeny-leksikov Can you catch this through configure check ?

evgeny-leksikov added a commit that referenced this issue Jul 26, 2022
add requirements for IB and RoCE transports

#8398
@evgeny-leksikov
Copy link
Contributor

evgeny-leksikov commented Jul 26, 2022

@yosefe @evgeny-leksikov Can you catch this through configure check ?

it's already there:

configure:28791: WARNING: RDMACM requested but librdmacm is not found or does not provide rdma_establish() API

@rrgargeya
Copy link
Author

We are installing a recent OFED and will post here if it resolves this issue.

While you are updating the FAQ section with the latest minimum required OFED version, could you also please update the other hardware and software compatible versions, such as Linux version, (Mellanox) HCA version, and others that may be relevant?

@shamisp
Copy link
Contributor

shamisp commented Jul 26, 2022

@evgeny-leksikov Does it fail or just print warning ?

@evgeny-leksikov
Copy link
Contributor

@evgeny-leksikov Does it fail or just print warning ?

@shamisp it prints warning since rdmacm is not mandatory dependency. UCP can fallback to TCP if rdmacm is absent.

@shamisp
Copy link
Contributor

shamisp commented Jul 26, 2022

I think if it was explicitly request --with-rdmacm the configure should fail.

@rrgargeya
Copy link
Author

We installed OFED version,
MLNX_OFED_LINUX-5.5-1.0.3.2, and
UCX 1.13

Closer, but still no cigar.

As you can see, an rdmacm listener is created and so also a TCP listener. The server callback is still not called.

I tried with this env variable
UCS_SOCKADDR_TLS_PRIORITY=rdmacm

The default values were 'rdmacm,tcp'

What should I be doing? Thanks.

[1658861965.246635] [hostA:53924:1] rdmacm_listener.c:104 UCX DEBUG listener 0x7ffbb40009a0: created on cm 0x24e98d0 172.200.10.72:10920 rdma_cm_id 0x7ffbb40009d0
[1658861965.246672] [hostA:53924:1] async.c:231 UCX DEBUG added async handler 0x7ffbb4000c60 [id=54 ref 1] ???() to hash
[1658861965.246680] [hostA:53924:1] async.c:509 UCX DEBUG listening to async event fd 54 events 0x5 mode thread_spinlock
[1658861965.246685] [hostA:53924:1] tcp_listener.c:136 UCX DEBUG created a TCP listener 0x7ffbb4000c30 on cm 0x24bc030 with fd: 54 listening on 172.200.10.72:10920

@evgeny-leksikov
Copy link
Contributor

I think if it was explicitly request --with-rdmacm the configure should fail.

@shamisp defenitely if it was requested but it was not:

configure --disable-logging --disable-debug --disable-assertions --disable-params-check

@evgeny-leksikov
Copy link
Contributor

@rrgargeya could you share client side log as well?
do you have IPoIB (or RoCE iterface) configured properly? can simple rdma test operate? For, example,
server:

$ib_send_lat --rdma_cm

client:

$ib_send_lat --rdma_cm <server IP address>

@rrgargeya
Copy link
Author

Subscriber.v2.docx
Publisher.v2.docx

  1. The full log (log level=debug) are above.

  2. ib_send_lat output below looks good.

$ ib_send_lat --rdma_cm 172.200.10.72

                Send Latency Test

Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 236[B]
rdma_cm QPs : ON
Data ex. method : rdma_cm

local address: LID 0x0d QPN 0x00a8 PSN 0x491c9e
remote address: LID 0x15 QPN 0x00bc PSN 0x99d7a

#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]
2 1000 0.88 7.41 0.94 0.96 0.33 1.00 7.41

@evgeny-leksikov
Copy link
Contributor

@rrgargeya Subscriber log seems weird:

/// INITIZIZATION

[1658860716.672097] [hostD:25487:a]       wireup_cm.c:579  UCX  DEBUG client created ep 0x7f013401a000 on device mlx5_0:1, tl_bitmap 0xf80 0x0 on cm rdmacm
took
Sensor 245 indicates 1

/// NOTHING from UCX here
...
Sensor 245 indicates 31
[1658860724.526822] [hostD:25487:0]         pgtable.c:618  UCX  DEBUG purge empty page table
[1658860724.526826] [hostD:25487:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed

/// DESTROY

it seems like UCX completely blocked including its async rdmacm thread and/or event channel

@rrgargeya
Copy link
Author

rrgargeya commented Jul 27, 2022

@evgeny-leksikov On the ucp server side, I do have a ucp_worker_progress() in a loop that is broken by a flag set in the server callback function. I do not have a ucp_worker_progress() on the client side.
I take it from your response, ucp_worker_progress() is needed on the client side also to progress the communications. If yes, what do I use to break out of the progress loop? Send a non-blocking test message to the server and use its send status as the progress loop sentinel? Thank you.

@evgeny-leksikov
Copy link
Contributor

ucp_worker_progress() is needed on both sides, it depends on application how to break the loop and when call ucp_worker_progress(). ucp_worker_progress() returns number of handled internal events or 0. In general approach comunications can be progressed when nothing to do or in separate thread. Also UCP_FEATURE_WAKEUP can be useful if saving CPU cycles is important and latancy is not so critical.

@rrgargeya
Copy link
Author

@evgeny-leksikov Server callback gets called now. Thank you.
Also thanks for updating the FAQ pages with the OFED version. Please consider adding any other hardware or software dependencies too.

@evgeny-leksikov
Copy link
Contributor

@rrgargeya so I think the issue can be closed since initial problem is solved?

ct-clmsn pushed a commit to tactcomplabs/ucx that referenced this issue Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants