Clip floating point constants to bf16 range to avoid inf conversion #20562

sangeethabal · 2022-12-02T23:51:54Z

When running HuggingFace BERT (any size) fine-tuning tutorial with transformers version >= 4.21.0 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, I see NaNs in the loss after the first step.

What does this PR do?

This PR addresses the issue where the model code passes a value that is out of range for XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, so the conversion would cast it to -inf.

The NaNs likely come from the transformers library change: #17306 . This PR replaced many lines which used to be -float(inf) (or other small constants) with torch.finfo().min. For torch.float32 the min value is -3.4028234663852886e+38 which is smaller than the bfloat16 minimum of -3.3895313892515355e+38. So the problem is that torch.finfo(torch.float32).min = -3.4028234663852886e+38 gets converted to -inf. When the original encoder_extended_attention_mask is 1, then encoder_extended_attention_mask becomes (1.0 - 1.0 ) * -inf which becomes NaN (via IEEE rule Inf * 0.0 = NaN).

This PR ensures torch.finfo(torch.bfloat16).min = -3.3895313892515355e+38 and not -inf. Then the results would not have Nans.

The following lines checks for XLA_USE_BF16 or XLA_DOWNCAST_BF16 environment variable and sets the dtype accordingly:

if is_torch_tpu_available():
   if os.environ.get("XLA_USE_BF16"):
       return torch.bfloat16
   if os.environ.get("XLA_DOWNCAST_BF16"):
       if t.dtype == torch.float:
            return torch.bfloat16
       if t.dtype == torch.double:
            return torch.float32

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger

…nf conversion

HuggingFaceDocBuilderDev · 2022-12-03T00:06:20Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for your PR! Just have a small comment on style and we should be good to go.

sgugger · 2022-12-05T15:00:41Z

src/transformers/modeling_utils.py

+            # Fixes issue where the model code passes a value that is out of range for XLA_USE_BF16=1
+            # and XLA_DOWNCAST_BF16=1 so the conversion would cast it to -inf
+            if is_torch_tpu_available():
+                if os.environ.get("XLA_USE_BF16"):


Can we just make an explicit comparison here? We usually dislike relying on Python bool conversion magic :-) We have in the utils some constants to regroup all kinds of truthy values that could be useful here.

Sure, let me change that and make it an explicit comparison

Co-authored-by: ydshieh <[email protected]>

* fix small nit * add last file

Co-authored-by: ydshieh <[email protected]>

* Remove is_encoder_decoder from some vision models * cleanup more * cleanup more Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* biogpt initial commit * updated init * fix faster decoding with use_cache * 1. fix input_ids and input_embeds with correct device 2. added _keys_to_ignore_on_load_missing 3. updated prepare_inputs_for_generation * add activation_dropout and scale_embedding * replace fsmt attention with bart attention * added test * run make fix-copies * doc init and fix build * updated README with proper information * 1. added tips to docs 2. updated BioGptTokenizer func * 1. added tokenizer test 2. refactor tokenizer * make fixup * add biogpt fairseq to hf converter * updated layer names more similar to original checkpoints * config update doc string and set defaults * added "#copied" from bart model and updated doc strings * enable model_input_names in tokenizer * 1. positionalembedding depending on attention_mask 2. added attention mask to prepare for generation * added test to verify past and generation * BioGptLMHeadModel -> BioGptForCausalLM * fix typo * tokenization and test Copyright and updated assertion * updated Copyright and one func at time in line * Copyright updates and minor doc fix * replace assertion with ValueError * rm extra space * added code syntax * revert cmnt position change * add tokenizer to auto * updated doc string * tokenizer doc string update * biogpt hub model update to microsoft/biogpt * make fixup * rm cmnt to fix flake8 5.0.4 vs 6 error

* Expected output for the test changed * fix failing asr test

* add support for `from_pt` * add tf_flax utility file * Update src/transformers/modeling_tf_flax_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * remove flax related modifications * add test * remove FLAX related commits * fixup * remove safetensor todos * revert deletion Co-authored-by: Sylvain Gugger <[email protected]>

* Make convert_to_onnx runable as script again Fix `convert_graph_to_onnx.py` relative import so it can be run as a script again. * Trigger CI

* add type annotations for esm chunk_utils use isinstance builtin instead of 'type(x) is y'; add assertions to aid in type inferencing; use bools instead of ints in _get_minimal_slice_set for improved type clarity; refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm data_transforms refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm feats utils refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm loss utils * add/fix type annotations for esm rigit_utils refactor to avoid re-assigning to the same variable with a different type; fix Callable, Tuple type hints; match conditional structure to other methods; fix return type on Rotation.cat and Rotation.unsqueeze * add type annotations for esm tensor_utils overload for tree_map; use insinstance builtin instead of 'type(x) is y'; export dict_multimap, flatten_final_dims, permute_final_dims in openfold_utils * add type annotations for esm protein utils add FIXME for attempted string mutation; add missing None check in get_pdb_headers; fix potentially unbound variable 'chain_tag' in to_pdb; modify get_pdb_headers return type * add type annotations for esm residue constants hints on collection constants; remove magic trailing comma to reduce number of lines; change list -> tuple for rigid_group_atom_positions for improved hinting * code style fixup Co-authored-by: Matt <[email protected]>

* rembert onnx config * formatting Co-authored-by: Ho <[email protected]>

* Fix link to table transformer detection microsoft model * Fix doc styles

Co-authored-by: ydshieh <[email protected]>

* Fix whisper and speech to text doc # What does this PR do? Previously the documentation was badly indented for both models and indicated that > If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value of `inputs_embeds`.` Which is on valid for the forward pass of the `ForConditionnalGeneration` not for the model alone. * other fixes

* remove set-output Co-authored-by: ydshieh <[email protected]>

* add v1 with tests * add checker * simplified version * update docstring * better version * fix docstring + change order * make style * tests + change conditions * final tests * modify docstring * Update src/transformers/feature_extraction_utils.py Co-authored-by: amyeroberts <[email protected]> * replace by `ValueError` * fix logic * apply suggestions * `dtype` is not needed * adapt suggestions * remove `_parse_args_to_device` Co-authored-by: amyeroberts <[email protected]>

* [Whisper] Fix decoder ids methods * enum property

* add whisper conversion scrip * update conversion script * update arg names * fix missing encoder_ffn_dim * fixup * ast nits

sgugger · 2022-12-05T23:12:06Z

Oh, looks like something went wrong in your rebase (see the diff showing lots of files). You can either force-push a commit (with --force) to repare the history for git, or close this PR and open a fresh one.

Clip floating point constants to either bf16 or fp16 range to avoid i…

ed1170f

…nf conversion

sangeethabal changed the title ~~Clip floating point constants to bf16 or fp16 range to avoid inf conversion~~ Clip floating point constants to bf16 range to avoid inf conversion Dec 3, 2022

sgugger reviewed Dec 5, 2022

View reviewed changes

ydshieh and others added 25 commits December 5, 2022 21:19

Fix torch device issues (#20584)

25a3fbb

Co-authored-by: ydshieh <[email protected]>

Fix flax GPT-J-6B linking model in tests (#20556)

d47e15c

[Vision] fix small nit on BeitDropPath layers (#20587)

b3f0343

* fix small nit * add last file

Fix repo consistency

4cf1aa4

Install natten with CUDA version (#20546)

1fff537

Co-authored-by: ydshieh <[email protected]>

Add entries to FEATURE_EXTRACTOR_MAPPING_NAMES (#20551)

ab00182

Co-authored-by: ydshieh <[email protected]>

Cleanup some config attributes (#20554)

f01a7d5

* Remove is_encoder_decoder from some vision models * cleanup more * cleanup more Co-authored-by: ydshieh <[email protected]>

[Whisper] Move decoder id method to tokenizer (#20589)

618abc3

Add require_torch to 2 pipeline tests (#20585)

7ac6289

Co-authored-by: ydshieh <[email protected]>

Install tensorflow_probability for TF pipeline CI (#20586)

7974f78

Co-authored-by: ydshieh <[email protected]>

Ci-whisper-asr (#20588)

48efeaa

* Expected output for the test changed * fix failing asr test

Make convert_to_onnx runable as script again (#20009)

8ccf58b

* Make convert_to_onnx runable as script again Fix `convert_graph_to_onnx.py` relative import so it can be run as a script again. * Trigger CI

Add RemBERT ONNX config (#20520)

3ca7d2b

* rembert onnx config * formatting Co-authored-by: Ho <[email protected]>

Fix link to Swin Model contributor novice03 (#20557)

6a7e817

Fix link to swin transformers v2 microsoft model (#20558)

e348480

Fix link to table transformer detection microsoft model (#20560)

c4f28ca

* Fix link to table transformer detection microsoft model * Fix doc styles

clean up unused classifier_dropout in config (#20596)

9195247

Co-authored-by: ydshieh <[email protected]>

Replace set-output by $GITHUB_OUTPUT (#20547)

e7ebfdc

* remove set-output Co-authored-by: ydshieh <[email protected]>

[Whisper] Fix decoder ids methods (#20599)

fc67b14

* [Whisper] Fix decoder ids methods * enum property

Add-whisper-conversion (#20600)

89d89a2

* add whisper conversion scrip * update conversion script * update arg names * fix missing encoder_ffn_dim * fixup * ast nits

sangeethabal closed this Dec 5, 2022

jeffhataws mentioned this pull request Dec 20, 2022

Add AWS Neuron torchrun support #20806

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clip floating point constants to bf16 range to avoid inf conversion #20562

Clip floating point constants to bf16 range to avoid inf conversion #20562

sangeethabal commented Dec 2, 2022

HuggingFaceDocBuilderDev commented Dec 3, 2022 •

edited

Loading

sgugger left a comment

sgugger Dec 5, 2022

sangeethabal Dec 5, 2022

sgugger commented Dec 5, 2022

Clip floating point constants to bf16 range to avoid inf conversion #20562

Clip floating point constants to bf16 range to avoid inf conversion #20562

Conversation

sangeethabal commented Dec 2, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 3, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

sgugger Dec 5, 2022

Choose a reason for hiding this comment

sangeethabal Dec 5, 2022

Choose a reason for hiding this comment

sgugger commented Dec 5, 2022

HuggingFaceDocBuilderDev commented Dec 3, 2022 •

edited

Loading