Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds new CPU kernel for SGD op supporting BF16 data type #32162

Merged
merged 10 commits into from
Apr 14, 2021

Conversation

arogowie-intel
Copy link
Contributor

PR types

New features

PR changes

OPs

Describe

This PR adds new CPU kernel for SGD op supporting BF16 data type.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 8, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@arogowie-intel
Copy link
Contributor Author

@wozna @arlesniak @jczaja Could you start review please?

Copy link
Contributor

@wozna wozna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything I mentioned is about PADDLE_ENFORCE
On this site, there is information on how we should write error messages: https://github.com/PaddlePaddle/Paddle/wiki/Paddle-Error-Message-Writing-Specification-(English-Verison)

paddle/fluid/operators/optimizers/sgd_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/optimizers/sgd_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/optimizers/sgd_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/optimizers/sgd_op.h Outdated Show resolved Hide resolved
paddle/fluid/operators/optimizers/sgd_op.h Outdated Show resolved Hide resolved
@arlesniak
Copy link
Contributor

Please make integration test i.e. test_fit_a_line.py to verify this implementation (it can be done in next PR)

@arogowie-intel
Copy link
Contributor Author

@wozna, @arlesniak Please review again.

@arogowie-intel
Copy link
Contributor Author

arogowie-intel commented Apr 12, 2021

@luotao1
The PR-CI-APPROVAL fails because of:
1.

  1. Unittest is not allowed to be disabled.

I'm using

@unittest.skipIf(not core.supports_bfloat16(),

To not run tests on platforms which does not support bfloat16 data type.

  1. Developers are not allowed to set the check_dygraph field directly, which is set to True by default.

I'm turning off dygraph mode since we does not support it with bfloat16 yet.

@arogowie-intel
Copy link
Contributor Author

@luotao1 Could you help with PR-CI-ROCM-Compile?
It somehow fails during compilation, however locally I do not see such an error. Moreover no other CI run failed with such an error.

@luotao1
Copy link
Contributor

luotao1 commented Apr 13, 2021

@qili93 Could you help see error of PR-CI-ROCM-Compile?

@luotao1
Copy link
Contributor

luotao1 commented Apr 13, 2021

@qili93
Copy link
Contributor

qili93 commented Apr 13, 2021

Hi @arogowie-intel to fix the build error or "PR-CI-ROCM-Compile", please change your code as following:

 PADDLE_ENFORCE(grad_var->IsType<framework::SelectedRows>(),
# ------ change the above code to below ----------
 PADDLE_ENFORCE_EQ(grad_var->IsType<framework::SelectedRows>(), true, 

ROCM defined PADDLE_ENFORCE as a device function in enforce.h, so please avoid to use it in host functions.

2021-04-12 21:28:13 In file included from /paddle/paddle/fluid/operators/optimizers/sgd_op.cu:16:
2021-04-12 21:28:13 /paddle/paddle/fluid/operators/optimizers/sgd_op.h:250:22: error: expected ')'
2021-04-12 21:28:13                      platform::errors::InvalidArgument(
2021-04-12 21:28:13                      ^
2021-04-12 21:28:13 /paddle/paddle/fluid/operators/optimizers/sgd_op.h:249:7: note: to match this '('
2021-04-12 21:28:13       PADDLE_ENFORCE(grad_var->IsType<framework::SelectedRows>(),
2021-04-12 21:28:13       ^
2021-04-12 21:28:13 /paddle/paddle/fluid/platform/enforce.h:438:13: note: expanded from macro 'PADDLE_ENFORCE'
2021-04-12 21:28:13       printf("Error: %s:%d Assertion `%s` failed. " __FORMAT "\n", __FILE__, \
2021-04-12 21:28:13             ^

* Use specialized PADDLE_ENFORCE_xx functions.
Copy link
Contributor

@arlesniak arlesniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arogowie-intel
Copy link
Contributor Author

@luotao1 Could you please start your review?

@arogowie-intel
Copy link
Contributor Author

@luotao1 I'm waiting for the last CI pm-test run, which is in "Pending" status from yesterday. Moreover it is not required one. Regarding the PR-CI-APPROVAL I have explained why it is failing in my earlier comment.
Could you start your review please? This PR is quite important for continuation of word2vec training work.

@luotao1
Copy link
Contributor

luotao1 commented Apr 14, 2021

I'm waiting for the last CI pm-test run, which is in "Pending" status from yesterday. Moreover it is not required one.

@arogowie-intel You can ignore the CI which is not the required one.

@luotao1 luotao1 merged commit 3ac6c18 into PaddlePaddle:develop Apr 14, 2021
@luotao1 luotao1 changed the title Aosewski/sgd bf16 adds new CPU kernel for SGD op supporting BF16 data type Apr 14, 2021
@arogowie-intel arogowie-intel deleted the aosewski/sgd_bf16 branch April 14, 2021 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants