Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CM/RDMACM: add UCX_RDMA_CM_SOURCE_ADDRESS config #6007

Merged
merged 1 commit into from
Dec 17, 2020

Conversation

evgeny-leksikov
Copy link
Contributor

What

add UCX_RDMA_CM_SOURCE_ADDRESS config

Why ?

to have a way to specify rdmacm src address

@evgeny-leksikov evgeny-leksikov added the WIP-DNM Work in progress / Do not review label Dec 9, 2020

{NULL}
};

static ucs_config_field_t uct_rdmacm_cm_config_table[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the rdmacm_cm.c file please. this file is approaching its EOL :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way, component is registered here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so..?
uct_rdmacm_cm_config_t is defined in rdmacm_cm.h, it would still work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, uct_rdmacm_cm_config_table should be defined in *.c as well as component to avoid global external variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have an external param in uct_cm.h for uct_cm_config_table[], don't see why not do it here as well.
i don't thinks it's a good idea to keep adding code to a file we plan to remove if it's possible not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove this file easily, likely just cleanup and rename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have an external param in uct_cm.h for uct_cm_config_table[]

I would try to avoid this...

ucs_offsetof(uct_rdmacm_cm_config_t, super), UCS_CONFIG_TYPE_TABLE(uct_cm_config_table)},

{"SOURCE_ADDRESS", "",
"Source address used in rdma_resolve_addr",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the expected format in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok now?

#
# Source IPv4 or IPv6 address used in rdma_resolve_addr or empty (equivalent to NULL)
#
# syntax:    string
#
UCX_RDMA_CM_SOURCE_ADDRESS=

@evgeny-leksikov evgeny-leksikov removed the WIP-DNM Work in progress / Do not review label Dec 9, 2020
if (rdmacm_config->src_addr[0] == '\0') {
self->config.src_addr = NULL;
} else {
self->config.src_addr = ucs_calloc(1, sizeof(struct sockaddr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be sizeof sockaddr_storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


{NULL}
};

static ucs_config_field_t uct_rdmacm_cm_config_table[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so..?
uct_rdmacm_cm_config_t is defined in rdmacm_cm.h, it would still work.


{"SOURCE_ADDRESS", "",
"Source IPv4 or IPv6 address used in rdma_resolve_addr or empty "
"(equivalent to NULL)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the user specify the port to use too? imo need to address this in the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, limitation are the same as for mentioned rdma_resolve_addr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then if the user sets the port, need to parse it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't inet_pton do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it will.
if it will, how should the user provide the port? this param is a string..
if it won't, we either should not support setting the port or not use this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, let's mention in the comment that the port would be selected by rdmacm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

goto ucs_async_remove_handler;
}

src_af = (strchr(rdmacm_config->src_addr, '.') != NULL) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an af detection function to sock.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too light weight detection for common code, here I pass address as is to rdmacm without any validation and if user pass wrong addr/format rdmacm would report error

@evgeny-leksikov evgeny-leksikov force-pushed the uct_rdmacm_local_addr branch 2 times, most recently from 5f1eb51 to 696e3de Compare December 10, 2020 13:47
@@ -17,8 +17,17 @@
typedef struct uct_rdmacm_cm {
uct_cm_t super;
struct rdma_event_channel *ev_ch;

struct {
struct sockaddr *src_addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need to be pointer?
can use AF_UNSPEC to mark it as invalid to pass NULL to rdmacm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO better to initialize once and reuse the exact value on control path every time, also NULL-pointer looks much simpler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it could be just "struct sockadd_storage" here which is initialized once, would be simpler to avoid malloc/free/free in err flow

src/uct/ib/rdmacm/rdmacm_cm.h Show resolved Hide resolved
src/uct/ib/rdmacm/rdmacm_cm.c Outdated Show resolved Hide resolved
@@ -567,9 +567,57 @@ static uct_iface_ops_t uct_rdmacm_cm_iface_ops = {
.iface_is_reachable = (uct_iface_is_reachable_func_t)ucs_empty_function_return_zero
};

static ucs_status_t uct_rdmacm_cm_straddr2sockaddr(const char *addr_str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uct_rdmacm_cm_str_addr_to_sockaddr

Comment on lines 592 to 647
sin = sa_storage;
sin->sin_family = AF_INET;
ret = inet_pton(AF_INET, addr_str, &sin->sin_addr);
if (ret == 1) {
goto out;
}

/* try IPv6 */
sin6 = sa_storage;
sin6->sin6_family = AF_INET6;
ret = inet_pton(AF_INET6, addr_str, &sin6->sin6_addr);
if (ret == 1) {
goto out;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in UCS ? + small gtest

@@ -17,8 +17,17 @@
typedef struct uct_rdmacm_cm {
uct_cm_t super;
struct rdma_event_channel *ev_ch;

struct {
struct sockaddr *src_addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it could be just "struct sockadd_storage" here which is initialized once, would be simpler to avoid malloc/free/free in err flow

@@ -397,6 +397,18 @@ class test_uct_cm_sockaddr : public uct_test {
m_listen_addr = GetParam()->listen_sock_addr;
m_connect_addr = GetParam()->connect_sock_addr;

if (ucs::rand() % 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be separate variant? to ensure coverage

m_connect_addr.get_sock_addr_in_buf(),
src_addr_cstr, UCS_SOCKADDR_STRING_LEN);
EXPECT_EQ(src_addr_cstr, ret);
UCS_TEST_MESSAGE << "RDMA_CM_SOURCE_ADDRESS=" << src_addr_cstr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to improve message

return UCS_OK;
}

ucs_error("cannot parse %s address", ip_str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucs_error("invalid address '%s'", ip_str);

ifa->ifa_addr, sizeof(struct sockaddr_in));
ifa->ifa_addr,
sizeof(struct sockaddr_in),
init_src_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass false here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO: init_src_addr improves readability

resource rsc(cmpt, std::string(name), local_cpus, "sockaddr",
std::string(ifa->ifa_name), UCT_DEVICE_TYPE_NET);
bool init_src_addr = (i == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to 228

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better localize it and move to the relevant if part only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it's closer to comment before the loop, so it's simpler to maintain them in consistency

Comment on lines 409 to 410
UCS_TEST_MESSAGE << "testing with RDMA_CM_SOURCE_ADDRESS="
<< src_addr_cstr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this to line 437?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better to log near action

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then log appears twice per test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?
i see only 1:

[ RUN      ] sockaddr/test_uct_cm_sockaddr.cm_server_reject/34 <sockaddr/ib3>
[     INFO ] Testing component: tcp
[     INFO ] testing with RDMA_CM_SOURCE_ADDRESS=172.24.1.10
[     INFO ] Testing 172.24.1.10:41656 Interface: ib3
[       OK ] sockaddr/test_uct_cm_sockaddr.cm_server_reject/34 (1 ms)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. [ INFO ] Testing component: tcp
  2. [ INFO ] testing with RDMA_CM_SOURCE_ADDRESS=172.24.1.10
  3. [ INFO ] Testing 172.24.1.10:41656 Interface: ib3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be:
[ INFO ] Testing tcp on 172.24.1.10:41656 interface: ib3 with RDMA_CM_SOURCE_ADDRESS=172.24.1.10

Copy link
Contributor Author

@evgeny-leksikov evgeny-leksikov Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will complicate the code in case if config is not set, are you fine with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to make simple change to have less log messages, if not possible - waive

Comment on lines 403 to 407
char src_addr_cstr[UCS_SOCKADDR_STRING_LEN];
int sa_family = src_sock_addr.get_sock_addr_ptr()->sa_family;
const char *ret = inet_ntop(sa_family,
src_sock_addr.get_sock_addr_in_buf(),
src_addr_cstr, UCS_SOCKADDR_STRING_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use ucs_sockaddr_str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucs_sockaddr_str also adds port, inet_ntop simplifies the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ucs_str_sockaddr does not parse port, but ucs_sockaddr_str adds port?
seems confusing..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to move back to uct/rdmacm code? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename, for example: ucs_sock_ipstr_to_sockaddr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

bool is_set;
char cstr[UCS_SOCKADDR_STRING_LEN];
} src_addr = {
.is_set = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just strcmp to ""?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if inet_ntop initializes cstr with "" or we would like to test empty value? then we can miss it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see
then IMO no need for struct - just declare 2 local vars..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vars always goes together so why dont unite them to the structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -672,6 +713,10 @@ UCS_CLASS_CLEANUP_FUNC(uct_rdmacm_cm_t)
{
ucs_status_t status;

if (self->config.src_addr != NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Comment on lines 32 to 33
"rdma_resolve_addr or empty (equivalent to NULL), the port would be\n"
"selected by rdmacm",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If non-empty, specify the local source address (IPv4 or IPv6) to use when creating a client connection

Comment on lines 691 to 693
EXPECT_NE((const struct sockaddr*)NULL, saddr);
EXPECT_TRUE((saddr->sa_family == AF_INET) ||
(saddr->sa_family == AF_INET6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be ucs_assert()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not ucs_assert for internal usage only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only, and the problem here is that EXPECT_EQ does not prevent the execution of following statements

status = ucs_sock_ipstr_to_sockaddr(ip_str, sa_storage);
if (status != UCS_OK) {
ucs_free(sa_storage);
return UCS_ERR_INVALID_ADDR;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover: return status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use goto for err flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

const void* sock_addr_storage::get_sock_addr_in_buf() const {
const struct sockaddr* saddr = get_sock_addr_ptr();

EXPECT_NE((const struct sockaddr*)NULL, saddr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucs_assert_always

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there is a dereferencing on next line, but ok will change as well

@evgeny-leksikov
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alinask
Copy link
Contributor

alinask commented Dec 17, 2020

only issue from CI is the doc.

@alinask alinask merged commit 5c5fe65 into openucx:master Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants