Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

GELU Operator #14449

Merged
merged 1 commit into from
Apr 4, 2019
Merged

GELU Operator #14449

merged 1 commit into from
Apr 4, 2019

Conversation

haojin2
Copy link
Contributor

@haojin2 haojin2 commented Mar 16, 2019

Description

Address #12984, operator from https://arxiv.org/abs/1606.08415.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • GELU operator (sym/nd.LeakyReLU(act_type='gelu'))
  • Unit test
  • Gluon interface (gluon.nn.GELU)

Comments

Flakiness also checked:

MXNET_TEST_COUNT=10000 nosetests tests/python/unittest/test_operator.py:test_gelu
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1909868054 to reproduce.
.
----------------------------------------------------------------------
Ran 1 test in 314.370s

OK
MXNET_TEST_COUNT=10000 nosetests tests/python/gpu/test_operator_gpu.py:test_gelu
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=2117426005 to reproduce.
.
----------------------------------------------------------------------
Ran 1 test in 446.883s

OK

@haojin2 haojin2 requested a review from szha as a code owner March 16, 2019 05:29
@haojin2
Copy link
Contributor Author

haojin2 commented Mar 16, 2019

@eric-haibin-lin @szha for review

CUBE_CONSTANT = 0.044715
TWO_OVER_PI = 0.7978845608028654
def g(x):
return TWO_OVER_PI * (x + CUBE_CONSTANT * np.power(x, 3))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often xxx is cheaper than x ** 3. Also, if one takes inspiration from https://en.wikipedia.org/wiki/Horner's_method , which is often more numerically stable, then g(x) could be

return TWO_OVER_PI * x * (1 + CUBE_CONSTANT * x * x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the actual implementation so performance is not critical here... I'll change the internal mshadow math kernel in C++ based on the link you provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hendrycks Thanks for your prompt reply! I'll wait for @eric-haibin-lin and @szha to give a review before we merge this in.

@haojin2 haojin2 force-pushed the gelu branch 2 times, most recently from 3bf1279 to e0a565c Compare March 16, 2019 06:39
@karan6181
Copy link
Contributor

@mxnet-label-bot add [Feature request, Operator, pr-awaiting-review]

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution! One comment about the macro

src/operator/mshadow_op.h Outdated Show resolved Hide resolved
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Remember to also add test for the new gluon block

@haojin2
Copy link
Contributor Author

haojin2 commented Mar 18, 2019

@szha Gluon test added, also fixed a problem with existing SELU test.

@haojin2 haojin2 force-pushed the gelu branch 3 times, most recently from 347e355 to 26d6fba Compare March 25, 2019 19:46
@haojin2 haojin2 force-pushed the gelu branch 2 times, most recently from fdda420 to b6cfc08 Compare March 28, 2019 01:17
@haojin2
Copy link
Contributor Author

haojin2 commented Apr 3, 2019

@szha Build finally passed, good for merge?

@haojin2 haojin2 merged commit b3ab101 into apache:master Apr 4, 2019
@haojin2 haojin2 deleted the gelu branch April 4, 2019 06:45
nswamy pushed a commit that referenced this pull request Apr 5, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants