Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CM/RDMACM: share dummy CQ per device #6016

Merged
merged 1 commit into from
Dec 15, 2020

Conversation

evgeny-leksikov
Copy link
Contributor

What

share dummy CQ per device

Why ?

to reduce resources consumption

How ?

ported from yosefe#105

@yosefe
Copy link
Contributor

yosefe commented Dec 11, 2020

@evgeny-leksikov was it clean CP?

@evgeny-leksikov
Copy link
Contributor Author

@evgeny-leksikov was it clean CP?

no, there were 2 conflicts around dev_name parameter

@yosefe
Copy link
Contributor

yosefe commented Dec 13, 2020

bot:pipe:retest

@@ -624,6 +674,8 @@ UCS_CLASS_CLEANUP_FUNC(uct_rdmacm_cm_t)

ucs_trace("destroying event_channel %p on cm %p", self->ev_ch, self);
rdma_destroy_event_channel(self->ev_ch);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove space line

#include <uct/base/uct_cm.h>
#include <ucs/datastruct/khash.h>

#include <infiniband/verbs.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's not needed

khiter_t iter;
int ret;

iter = kh_put(uct_rdmacm_cm_cqs, &cm->cqs, device_name, &ret);
Copy link
Contributor

@yosefe yosefe Dec 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work?? looks like the hash table key can point to memory on the stack (device_name->dev_name)
IMO better use rdma_cm_id::pd pointer as the key
Any chance it's using stack garbage as the key and actually creating new CQ every time?

@evgeny-leksikov evgeny-leksikov force-pushed the uct_rdmacm_cm_shared_cq_m branch from 2309b48 to 7d38a90 Compare December 14, 2020 15:06
@evgeny-leksikov
Copy link
Contributor Author

re-cherry-picked squashed yosefe#105 + yosefe#113 to avoid conflicts which were appeared due to wrong approach with dev_name


/* create a dummy qp in order to get a unique qp_num to provide to librdmacm */
status = uct_rdmacm_cm_create_dummy_cq_qp(cep->id, &cq, &qp);
status = uct_rdmacm_cm_create_dummy_qp(cep->id, cq, &cep->qp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call uct_rdmacm_cm_get_cq from uct_rdmacm_cm_create_dummy_qp()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid hidden side effect of such function, let's keep things simpler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@yosefe yosefe merged commit 02782d0 into openucx:master Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants