Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Fix BERT fp16 bugs, add test #1270

Merged
merged 5 commits into from
Sep 4, 2020
Merged

Fix BERT fp16 bugs, add test #1270

merged 5 commits into from
Sep 4, 2020

Conversation

MoisesHer
Copy link
Contributor

@MoisesHer MoisesHer commented Jul 18, 2020

Description

Fixing fp16 bug in a previous PR: #1264
A bug was introduced in the last commit (merging with numpy upstream) of that PR when solving a conflict. Apologies
Bug only affects fp16 case, returning NaNs
Model was also not hybridized after being cast.
[x] Both issues are solved in this PR.
[x] A test was added to compare FP32 vs. FP16 results in BERT inference

cc @dmlc/gluon-nlp-team @sxjscience

@codecov
Copy link

codecov bot commented Jul 18, 2020

Codecov Report

Merging #1270 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1270   +/-   ##
=======================================
  Coverage   81.75%   81.75%           
=======================================
  Files          52       52           
  Lines        6862     6862           
=======================================
  Hits         5610     5610           
  Misses       1252     1252           
Impacted Files Coverage Δ
src/gluonnlp/data/tokenizers/sentencepiece.py 75.44% <0.00%> (-0.60%) ⬇️
src/gluonnlp/data/tokenizers/yttm.py 82.75% <0.00%> (+0.86%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0b293e...9ce2552. Read the comment docs.

@szha
Copy link
Member

szha commented Jul 18, 2020

Could you add a test for the expected data type?

@sxjscience
Copy link
Member

@MoisesHer Would you help add a test in https://github.com/dmlc/gluon-nlp/blob/numpy/tests/test_models_bert.py? You can do a similar forward test as in

@pytest.mark.remote_required
@pytest.mark.parametrize('model_name', list_pretrained_roberta())
def test_roberta(model_name):
# test from pretrained
assert len(list_pretrained_roberta()) > 0
with tempfile.TemporaryDirectory() as root:
cfg, tokenizer, params_path =\
get_pretrained_roberta(model_name, root=root)
assert cfg.MODEL.vocab_size == len(tokenizer.vocab)
roberta_model = RobertaModel.from_cfg(cfg)
roberta_model.load_parameters(params_path)
# test forward
batch_size = 3
seq_length = 32
vocab_size = len(tokenizer.vocab)
input_ids = mx.np.array(
np.random.randint(
2,
vocab_size,
(batch_size, seq_length)
),
dtype=np.int32
)
valid_length = mx.np.array(
np.random.randint(
seq_length // 2,
seq_length,
(batch_size,)
),
dtype=np.int32
)
x = roberta_model(input_ids, valid_length)
and compare the final result.

@MoisesHer MoisesHer changed the title Fix fp16 bug: not passing dtype to TransformerEncoderLayer Fix BERT fp16 bugs, add test Jul 21, 2020
@sxjscience
Copy link
Member

Need to wait for the GPU CI.

@szha szha changed the base branch from numpy to master August 13, 2020 02:29
@sxjscience
Copy link
Member

sxjscience commented Sep 1, 2020

@MoisesHer The GPU CI should be functional now. Would you try to create a new PR to add FP16 functionality?
Sorry for the confusion, we may still need to fix the GPU CI test.

@sxjscience
Copy link
Member

You can refer to

def test_bert_small_cfg(compute_layout, ctx):
. Here, we just add a ctx to the arguments so it becomes a fixture.

@MoisesHer MoisesHer requested a review from a team as a code owner September 3, 2020 23:11
Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sxjscience
Copy link
Member

CC @zheyuye @szha @hymzoque I'll merge this in first.

@sxjscience sxjscience merged commit 9711e5e into dmlc:master Sep 4, 2020
zheyuye added a commit to zheyuye/gluon-nlp that referenced this pull request Oct 20, 2020
* Fix BERT fp16 bugs, add test (dmlc#1270)

* Fix fp16 bug: not passing dtype to TransformerEncoderLayer

* Re-hybridize after casting & add BERT test

* Skip fp16 test if CPU ctx

* remove debugging messages

Co-authored-by: root <[email protected]>

* [Fix][SageMaker] Make sure that the installation works in SageMaker (dmlc#1348)

* Fasttext to 0.9.1

* Update setup.py

* [CI] Add Codecov and Test Logs (dmlc#1349)

* [Fix] Some minor fixes for AMLC Tutorial (dmlc#1355)

* update

update

update

update

* Update test_utils_misc.py

* update

* update

* Update test_layers.py

* Update misc.py

* Update mobilebert.py

* add in_units and in_channels

* Update __init__.py

* Update mobilebert.py

* Update README.md

* fix test case

* fix

* Update test_utils_misc.py

* fix bug

* [FEATURE] gpt2 generation scripts (dmlc#1354)

* remove prev_len in hybrid_forward parameters

* update

* sample

* update

* add gpt2_1558M

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

Co-authored-by: Hu <[email protected]>

* [Fix] Minor fix for AMLC Tutorial - QA (dmlc#1359)

* update

Update README.md

update

try to use dataclasses

* Update squad_utils.py

* Update preprocessing.py

* Update squad_utils.py

* Update run_squad.py

* [Log Message Improvement] Improve nlp process (dmlc#1362)

* Update learn_subword.py

* Update learn_subword.py

* Update learn_subword.py

* Update apply_subword.py

* Set default ctx in conftest (dmlc#1363)

* Fix the correctness of the Horovod support on squad (dmlc#1353)

* revise squad

* tiny fix

* fix total_norm logging

* shuffle before and after splitting

* make pre_shuffle_seed fixed

* fix flags

* remove do_pre_shuffle

* remove inside_split_shuffle

Co-authored-by: Ubuntu <[email protected]>

* [CI][BUGFIX] Custom Step for Uploading Code Coverage in Pull Request Event (dmlc#1364)

* [FEATURE]Generation script improvement (dmlc#1365)

* update

* update

* update

* update

* update

* udpate

* update

* update

* update

* update

Co-authored-by: Hu <[email protected]>

* [Website][CI] Build Website without Warnings + Add Workflow for Building Website  (dmlc#1327)

* [Website] Documentation warnings Fixed + Create Makefile

[Website] Documentation bug fix

[Website] Bug fix

[Website] Build without model_zoo

[Website] Fix notebook

* [Website][CI] Add workflow for building website

* [CI] Add more dependencies

* [CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

[CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [CI] Update buildwebsite.yml

* [Website] Add more dependencies

* [Website][CI] Add Compile notebook step + Preview website

* [CI] Add shell script for compiling notebooks

* [CI] Add permission for shell script

* [Website] Update

* [Website] Update

* [CI] Add uploading build artifacts

* [CI] Update

* [CI] Update Indentation

* [CI] Remove some dependencies

* [BUGFIX] Fix URL encoding (dmlc#1370)

* [FEATURE]Update readme of nmt (dmlc#1373)

* update

* update

* update

* update

* update

* update

* update

* update

Co-authored-by: Hu <[email protected]>

* [CI] Improve website building workflow (dmlc#1377)

* BERT pretraining (dmlc#1376)

* bert

* update

* address comments

* update

* [Fix][Docker] Fix the docker image + Fix pretrain_corpus document. (dmlc#1378)

* update

* Update ubuntu18.04-devel-gpu.Dockerfile

* fix the docker image

* Update README.md

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update README.md

* fix readme

* Add CPU DockerFile

* update

* update

* Update ubuntu18.04-devel-gpu.Dockerfile

* update

* prepare to add TVM to docker

* try to update

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update install_openmpi.sh

* update

* Create install_llvm.sh

* Update ubuntu18.04-base-gpu.Dockerfile

* Update ubuntu18.04-base-gpu.Dockerfile

* Update run_squad2_albert_base.sh

* Update prepare_squad.py

* Update prepare_squad.py

* Update prepare_squad.py

* fix

* Update README.md

* update

* update

* Update README.md

* Update README.md

* Update ubuntu18.04-devel-gpu.Dockerfile

* update

* Update README.md

* fix

* Update ubuntu18.04-base-cpu.Dockerfile

* update

* add tvm to lazy import

* update

* Update README.md

* update

* Update README.md

* Update run_squad2_albert_base.sh

* update

* update

* update

* update

* update

* Update README.md

* Update install_ubuntu18.04_core.sh

* update

* update

* update

* fix

* Update README.md

* Update run_batch_squad.sh

* update

* Update run_batch_squad.sh

* Update run_batch_squad.sh

* update

* Update README.md

* fix

* Update gluon_nlp_job.sh

* update

* Update README.md

* Update README.md

* Update README.md

* update

* Update README.md

* update

* Update install_python_packages.sh

* Update install_llvm.sh

* Update install_python_packages.sh

* Update install_llvm.sh

* update

* Update install_ubuntu18.04_core.sh

* fix

* Update submit-job.py

* Update submit-job.py

* Update README.md

* Update README.md

* Update prepare_gutenberg.py

* Delete gluon_nlp_cpu_job.sh

* Update prepare_gutenberg.py

* Update prepare_gutenberg.py

* Update prepare_gutenberg.py

* Update conf.py

* update

* Update generate_commands.py

* fix readme

* use os.link for hard link

* Update README.md

* Update README.md

* Update gluon_nlp_job.sh

* Update __init__.py

* Update benchmark_utils.py

* try to use multi-stage build

* Update benchmark_utils.py

* multi-stage build

* Update README.md

* Update README.md

* update

* Update submit-job.py

* fix documentation

* fix

* update

* Update test.sh

* Update test.sh

* Update test.sh

* Update test.sh

* Update README.md

* Update test.sh

* fix

* Update README.md

* Update gluon_nlp_job.sh

* [Website] Add AMLC Tutorial to Website (dmlc#1379)

* [Website] Add AMLC Tutorial

* [Website] Add tsv encoding

* [Website] Add model zoo

* [Website] Update Makefile

* [Website] Update Makefile

* [Website] Update Makefile

* [Website] Update compile_notebooks.sh

* [Website] Update Makefile

* [Website] Add title to generation

* [Website] Update workflow

* update

* [Website] Update model_zoo.rst

* [Website] Update model_zoo.rst

* [BUGFIX] Fix Codecov (dmlc#1391)

* Update coveragerc

* Update coveragerc

* Update coveragerc

* Update workflow

* Update workflow

* update

* update

Co-authored-by: MoisesHer <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Xingjian Shi <[email protected]>
Co-authored-by: barry-jin <[email protected]>
Co-authored-by: ht <[email protected]>
Co-authored-by: Hu <[email protected]>
Co-authored-by: Leonard Lausen <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ziyue Huang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants