Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneCCL] enable bf16 comm. #73

Closed
wants to merge 1 commit into from
Closed

[oneCCL] enable bf16 comm. #73

wants to merge 1 commit into from

Conversation

marvin-Yu
Copy link
Contributor

@marvin-Yu marvin-Yu commented Nov 23, 2023

Enable communication using XFT_ONECCL_BF16 for bf16 data type.

@@ -78,7 +78,9 @@ class bfloat16_t {
}

static void cvt_float_to_bfloat16(const float *src, bfloat16_t *dst, int size);
static void batch_cvt_float_to_bfloat16(const float *src, bfloat16_t *dst, int size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@changqi1 Do you accept adding it in bfloat16.h?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pujiang2018 @marvin-Yu Sure. It is great to add batch cvt. But the number 1024 is not make sense.

bf16_enable = (getenv("XFT_ONECCL_BF16") ? atoi(getenv("XFT_ONECCL_BF16")) : 0);
if (bf16_enable) {
printf("got 'XFT_ONECCL_BF16=%d', enable BF16 dtype comm.\n", bf16_enable);
buf_bf16 = new bfloat16_t[MAX_BF16_BUFFER];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest using aligned_alloc or malloc.
as new will initialize all the data to 0, which is not needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this macro XFT_ONECCL_BF16 into wiki.

TimeLine t("Messenger.reduceAdd");
if (bf16_enable) {
bfloat16_t::batch_cvt_float_to_bfloat16(sendBuf, buf_bf16, count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if count > MAX_BF16_BUFFER?

if (bf16_enable) {
bfloat16_t::batch_cvt_float_to_bfloat16(sendBuf, buf_bf16, count);
ccl::allreduce(
buf_bf16, buf_bf16, count, ccl::datatype::bfloat16, ccl::reduction::sum, *pcomm).wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the code looks like not formatted.

@@ -175,7 +201,8 @@ class Messenger {
int size;
int rank;
bool local_ranks_flag;

bfloat16_t* buf_bf16 = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's initialize it in the constructor to make sure the same style.

@@ -128,6 +130,18 @@ inline void bfloat16_t::cvt_float_to_bfloat16(const float *src, bfloat16_t *dst,
}
}

inline void bfloat16_t::batch_cvt_float_to_bfloat16(const float *src, bfloat16_t *dst, int count){
constexpr int sizePerSplit = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use template to define blockSize or use totalSize/totalSystemThreads to define the blockSize.

bf16_enable = (getenv("XFT_ONECCL_BF16") ? atoi(getenv("XFT_ONECCL_BF16")) : 0);
if (bf16_enable) {
printf("got 'XFT_ONECCL_BF16=%d', enable BF16 dtype comm.\n", bf16_enable);
buf_bf16 = new bfloat16_t[MAX_BF16_BUFFER];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this macro XFT_ONECCL_BF16 into wiki.

bf16_enable = (getenv("XFT_ONECCL_BF16") ? atoi(getenv("XFT_ONECCL_BF16")) : 0);
if (bf16_enable) {
printf("got 'XFT_ONECCL_BF16=%d', enable BF16 dtype comm.\n", bf16_enable);
buf_bf16 = new bfloat16_t[MAX_BF16_BUFFER];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init buf_bf16 with max length may cause performance communication issue, please double confirm this.

@pujiang2018 pujiang2018 marked this pull request as draft December 6, 2023 07:17
@marvin-Yu marvin-Yu closed this Jan 23, 2024
@marvin-Yu marvin-Yu deleted the feature/comm_by_bf16 branch February 2, 2024 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants