Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIREUP: exclude ep_check lanes where KA supported from map #6039

Merged
merged 1 commit into from
Dec 23, 2020

Conversation

hoopoepg
Copy link
Contributor

  • exclude from ep_check map lanes where built-in keepalive
    feature is supported
  • updated ep-check gtests to disable built-in keepalive

@hoopoepg
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 1525 to 1536
for (lane = 0; lane < key->num_lanes; ++lane) {
/* add lanes to ep_check map */
rsc_index = key->lanes[lane].rsc_index;
if (rsc_index == UCP_NULL_RESOURCE) {
continue;
}

dev_index = context->tl_rscs[rsc_index].dev_index;
ucs_assert(dev_index < (sizeof(dev_map_used) * 8));
iface_attr = ucp_worker_iface_get_attr(worker, rsc_index);
if (iface_attr->cap.flags & UCT_IFACE_FLAG_EP_KEEPALIVE) {
dev_map_used |= UCS_BIT(dev_index);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do it inside one loop (below)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@hoopoepg hoopoepg force-pushed the topic/update-ep-check-map branch from 2706e0a to 94a1f3e Compare December 17, 2020 13:18
Comment on lines 1523 to 1524
/* exclude from ep_check map all devices where built-in
* keepalive is supported - mark all such devices as used */
for (lane = 0; lane < key->num_lanes; ++lane) {
Copy link
Contributor

@yosefe yosefe Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we unite the 2 loops?
with the one in line 1539

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no: we need 2 loops (nested or pipe-lined) in any case
@brminich wanted to propose any unification, but still no news from him

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be no clear way to unite given generic use-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I didn't find any was... I can add some macro to make code bit shorter, but in fact there will be same pair of loops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

for (lane ...) {
    ...
    dev_index = ...
    iface_attr = ...
    if (FLAG_EP_KEEPALIVE) {
        dev_map_used |= UCS_BIT(dev_index);
    } else if ((!(UCS_BIT(dev_index) & dev_map_used) && EP_CHECK) {
        kep_ep_check_map |= ...
        dev_map_used |= UCS_BIT(dev_index);
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe precompute it during worker init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

will not work: in case if any lane-bit is added to ep_check map and later we found that another lane on same device has built-in keepalive, then we have to remove such lane-bit from ep_check map - this is issue. and we can't pre-compile it in worker init because in general case there is no correllation between ep_check_map and devices

Copy link
Contributor

@yosefe yosefe Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. so please add comments.
to 1st loop: /* find all devices with built-in keepalive support */
for 2nd loop: /* send ep_check on devices without built-in keepalive */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comments

- exclude from ep_check map lanes where built-in keepalive
  feature is supported
- updated ep-check gtests to disable built-in keepalive
@hoopoepg hoopoepg force-pushed the topic/update-ep-check-map branch from 94a1f3e to ea23a9d Compare December 18, 2020 15:04
@hoopoepg
Copy link
Contributor Author

io_demo failed

@hoopoepg
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hoopoepg
Copy link
Contributor Author

failed due to #6027

@hoopoepg
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brminich brminich merged commit c27e623 into openucx:master Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants