-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore PFs without RDMA cap on BF2 according to its gid_tlb_len==0 #5965
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/uct/ib/base/ib_device.c
Outdated
return UCS_ERR_UNSUPPORTED; | ||
} | ||
|
||
num_ports = dev->num_ports < UCT_IB_DEV_MAX_PORTS ? dev->num_ports : UCT_IB_DEV_MAX_PORTS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_min
src/uct/ib/base/ib_device.c
Outdated
ucs_error("%s has %d ports, but only up to %d are supported", | ||
ibv_get_device_name(ibv_device), dev->num_ports, | ||
UCT_IB_DEV_MAX_PORTS); | ||
return UCS_ERR_UNSUPPORTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this to line 532: if num ports is > MAX_PORTS, print warning, but continue using only 2 ports, and NOT fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- So for BF2/ARM switchdev mode, it will always have warnings. Is this good?
- This will change the original logic before my changes since printing error and return UNSUPPORTED was the original logic. But I also think it's fine to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let's make it ucs_debug (not ucs_warn)
src/uct/ib/base/ib_device.c
Outdated
if (dev->port_attr[i].gid_tbl_len > 0) { | ||
has_gid = 1; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (dev->port_attr[i].gid_tbl_len == 0) {
return UCS_ERR_UNSUPPORTED;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe One device has max 2 ports are allowed. If any one port with gid_tbl_len == 0 then we think the device is UNSUPPORTED even if the other port of the same device with gid_tbl_len > 0. Right? Maybe we won't have this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can check num of gid in uct_ib_device_port_check() to enable each port separately
bot:pipe:retest |
src/uct/ib/base/ib_device.c
Outdated
} | ||
|
||
num_ports = ucs_min(dev->num_ports, UCT_IB_DEV_MAX_PORTS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- remove num_ports
if (dev->num_ports > UCT_IB_DEV_MAX_PORTS) {
ucs_debug(...)
dev->num_ports = UCT_IB_DEV_MAX_PORTS;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
bot:pipe:retest |
What
Ignore PFs without RDMA cap on BF2 according to its gid_tlb_len==0
Why ?
On BF2/ARM switchdev mode only SFs have RDMA cap, PFs don't have RDMA cap. We should filter out the SF devices with RDMA cap and ignore the PF devices without RDMA cap.
How ?
We can filter out the devices with RDMA cap according to gid_tlb_len > 0, and ignore the devices without RDMA cap according to gid_tlb_len==0.