-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CM/RDMACM: add UCX_RDMA_CM_SOURCE_ADDRESS config #6007
Conversation
a152b88
to
a7a2b52
Compare
|
||
{NULL} | ||
}; | ||
|
||
static ucs_config_field_t uct_rdmacm_cm_config_table[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to the rdmacm_cm.c file please. this file is approaching its EOL :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no way, component is registered here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so..?
uct_rdmacm_cm_config_t is defined in rdmacm_cm.h, it would still work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, uct_rdmacm_cm_config_table
should be defined in *.c as well as component to avoid global external
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have an external param in uct_cm.h for uct_cm_config_table[], don't see why not do it here as well.
i don't thinks it's a good idea to keep adding code to a file we plan to remove if it's possible not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can remove this file easily, likely just cleanup and rename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have an external param in uct_cm.h for uct_cm_config_table[]
I would try to avoid this...
src/uct/ib/rdmacm/rdmacm_md.c
Outdated
ucs_offsetof(uct_rdmacm_cm_config_t, super), UCS_CONFIG_TYPE_TABLE(uct_cm_config_table)}, | ||
|
||
{"SOURCE_ADDRESS", "", | ||
"Source address used in rdma_resolve_addr", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the expected format in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok now?
#
# Source IPv4 or IPv6 address used in rdma_resolve_addr or empty (equivalent to NULL)
#
# syntax: string
#
UCX_RDMA_CM_SOURCE_ADDRESS=
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
if (rdmacm_config->src_addr[0] == '\0') { | ||
self->config.src_addr = NULL; | ||
} else { | ||
self->config.src_addr = ucs_calloc(1, sizeof(struct sockaddr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be sizeof sockaddr_storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
{NULL} | ||
}; | ||
|
||
static ucs_config_field_t uct_rdmacm_cm_config_table[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so..?
uct_rdmacm_cm_config_t is defined in rdmacm_cm.h, it would still work.
src/uct/ib/rdmacm/rdmacm_md.c
Outdated
|
||
{"SOURCE_ADDRESS", "", | ||
"Source IPv4 or IPv6 address used in rdma_resolve_addr or empty " | ||
"(equivalent to NULL)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the user specify the port to use too? imo need to address this in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, limitation are the same as for mentioned rdma_resolve_addr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then if the user sets the port, need to parse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't inet_pton
do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it will.
if it will, how should the user provide the port? this param is a string..
if it won't, we either should not support setting the port or not use this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, let's mention in the comment that the port would be selected by rdmacm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
goto ucs_async_remove_handler; | ||
} | ||
|
||
src_af = (strchr(rdmacm_config->src_addr, '.') != NULL) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an af detection function to sock.c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too light weight detection for common code, here I pass address as is to rdmacm without any validation and if user pass wrong addr/format rdmacm would report error
5f1eb51
to
696e3de
Compare
src/uct/ib/rdmacm/rdmacm_cm.h
Outdated
@@ -17,8 +17,17 @@ | |||
typedef struct uct_rdmacm_cm { | |||
uct_cm_t super; | |||
struct rdma_event_channel *ev_ch; | |||
|
|||
struct { | |||
struct sockaddr *src_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it need to be pointer?
can use AF_UNSPEC to mark it as invalid to pass NULL to rdmacm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO better to initialize once and reuse the exact value on control path every time, also NULL-pointer looks much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it could be just "struct sockadd_storage" here which is initialized once, would be simpler to avoid malloc/free/free in err flow
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
@@ -567,9 +567,57 @@ static uct_iface_ops_t uct_rdmacm_cm_iface_ops = { | |||
.iface_is_reachable = (uct_iface_is_reachable_func_t)ucs_empty_function_return_zero | |||
}; | |||
|
|||
static ucs_status_t uct_rdmacm_cm_straddr2sockaddr(const char *addr_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uct_rdmacm_cm_str_addr_to_sockaddr
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
sin = sa_storage; | ||
sin->sin_family = AF_INET; | ||
ret = inet_pton(AF_INET, addr_str, &sin->sin_addr); | ||
if (ret == 1) { | ||
goto out; | ||
} | ||
|
||
/* try IPv6 */ | ||
sin6 = sa_storage; | ||
sin6->sin6_family = AF_INET6; | ||
ret = inet_pton(AF_INET6, addr_str, &sin6->sin6_addr); | ||
if (ret == 1) { | ||
goto out; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this in UCS ? + small gtest
src/uct/ib/rdmacm/rdmacm_cm.h
Outdated
@@ -17,8 +17,17 @@ | |||
typedef struct uct_rdmacm_cm { | |||
uct_cm_t super; | |||
struct rdma_event_channel *ev_ch; | |||
|
|||
struct { | |||
struct sockaddr *src_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it could be just "struct sockadd_storage" here which is initialized once, would be simpler to avoid malloc/free/free in err flow
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -397,6 +397,18 @@ class test_uct_cm_sockaddr : public uct_test { | |||
m_listen_addr = GetParam()->listen_sock_addr; | |||
m_connect_addr = GetParam()->connect_sock_addr; | |||
|
|||
if (ucs::rand() % 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be separate variant? to ensure coverage
test/gtest/uct/ib/test_sockaddr.cc
Outdated
m_connect_addr.get_sock_addr_in_buf(), | ||
src_addr_cstr, UCS_SOCKADDR_STRING_LEN); | ||
EXPECT_EQ(src_addr_cstr, ret); | ||
UCS_TEST_MESSAGE << "RDMA_CM_SOURCE_ADDRESS=" << src_addr_cstr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to improve message
src/ucs/sys/sock.c
Outdated
return UCS_OK; | ||
} | ||
|
||
ucs_error("cannot parse %s address", ip_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_error("invalid address '%s'", ip_str);
ifa->ifa_addr, sizeof(struct sockaddr_in)); | ||
ifa->ifa_addr, | ||
sizeof(struct sockaddr_in), | ||
init_src_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: init_src_addr
improves readability
resource rsc(cmpt, std::string(name), local_cpus, "sockaddr", | ||
std::string(ifa->ifa_name), UCT_DEVICE_TYPE_NET); | ||
bool init_src_addr = (i == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to 228
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO better localize it and move to the relevant if part only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it's closer to comment before the loop, so it's simpler to maintain them in consistency
test/gtest/uct/ib/test_sockaddr.cc
Outdated
UCS_TEST_MESSAGE << "testing with RDMA_CM_SOURCE_ADDRESS=" | ||
<< src_addr_cstr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this to line 437?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO better to log near action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then log appears twice per test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
i see only 1:
[ RUN ] sockaddr/test_uct_cm_sockaddr.cm_server_reject/34 <sockaddr/ib3>
[ INFO ] Testing component: tcp
[ INFO ] testing with RDMA_CM_SOURCE_ADDRESS=172.24.1.10
[ INFO ] Testing 172.24.1.10:41656 Interface: ib3
[ OK ] sockaddr/test_uct_cm_sockaddr.cm_server_reject/34 (1 ms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [ INFO ] Testing component: tcp
- [ INFO ] testing with RDMA_CM_SOURCE_ADDRESS=172.24.1.10
- [ INFO ] Testing 172.24.1.10:41656 Interface: ib3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be:
[ INFO ] Testing tcp on 172.24.1.10:41656 interface: ib3 with RDMA_CM_SOURCE_ADDRESS=172.24.1.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will complicate the code in case if config is not set, are you fine with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to make simple change to have less log messages, if not possible - waive
test/gtest/uct/ib/test_sockaddr.cc
Outdated
char src_addr_cstr[UCS_SOCKADDR_STRING_LEN]; | ||
int sa_family = src_sock_addr.get_sock_addr_ptr()->sa_family; | ||
const char *ret = inet_ntop(sa_family, | ||
src_sock_addr.get_sock_addr_in_buf(), | ||
src_addr_cstr, UCS_SOCKADDR_STRING_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use ucs_sockaddr_str?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_sockaddr_str
also adds port, inet_ntop
simplifies the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ucs_str_sockaddr does not parse port, but ucs_sockaddr_str adds port?
seems confusing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to move back to uct/rdmacm code? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename, for example: ucs_sock_ipstr_to_sockaddr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
bool is_set; | ||
char cstr[UCS_SOCKADDR_STRING_LEN]; | ||
} src_addr = { | ||
.is_set = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just strcmp to ""?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if inet_ntop
initializes cstr
with ""
or we would like to test empty value? then we can miss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see
then IMO no need for struct - just declare 2 local vars..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the vars always goes together so why dont unite them to the structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
@@ -672,6 +713,10 @@ UCS_CLASS_CLEANUP_FUNC(uct_rdmacm_cm_t) | |||
{ | |||
ucs_status_t status; | |||
|
|||
if (self->config.src_addr != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
src/uct/ib/rdmacm/rdmacm_md.c
Outdated
"rdma_resolve_addr or empty (equivalent to NULL), the port would be\n" | ||
"selected by rdmacm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If non-empty, specify the local source address (IPv4 or IPv6) to use when creating a client connection
test/gtest/common/test_helpers.cc
Outdated
EXPECT_NE((const struct sockaddr*)NULL, saddr); | ||
EXPECT_TRUE((saddr->sa_family == AF_INET) || | ||
(saddr->sa_family == AF_INET6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be ucs_assert()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not ucs_assert
for internal usage only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only, and the problem here is that EXPECT_EQ does not prevent the execution of following statements
src/uct/ib/rdmacm/rdmacm_cm.c
Outdated
status = ucs_sock_ipstr_to_sockaddr(ip_str, sa_storage); | ||
if (status != UCS_OK) { | ||
ucs_free(sa_storage); | ||
return UCS_ERR_INVALID_ADDR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover: return status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use goto for err flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
test/gtest/common/test_helpers.cc
Outdated
const void* sock_addr_storage::get_sock_addr_in_buf() const { | ||
const struct sockaddr* saddr = get_sock_addr_ptr(); | ||
|
||
EXPECT_NE((const struct sockaddr*)NULL, saddr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_assert_always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually there is a dereferencing on next line, but ok will change as well
3270652
to
9788cb3
Compare
9788cb3
to
ac17a6f
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
only issue from CI is the doc. |
What
add UCX_RDMA_CM_SOURCE_ADDRESS config
Why ?
to have a way to specify rdmacm src address