-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clip floating point constants to bf16 range to avoid inf conversion #20562
Clip floating point constants to bf16 range to avoid inf conversion #20562
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! Just have a small comment on style and we should be good to go.
# Fixes issue where the model code passes a value that is out of range for XLA_USE_BF16=1 | ||
# and XLA_DOWNCAST_BF16=1 so the conversion would cast it to -inf | ||
if is_torch_tpu_available(): | ||
if os.environ.get("XLA_USE_BF16"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make an explicit comparison here? We usually dislike relying on Python bool conversion magic :-) We have in the utils some constants to regroup all kinds of truthy values that could be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me change that and make it an explicit comparison
Co-authored-by: ydshieh <[email protected]>
* fix small nit * add last file
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: ydshieh <[email protected]>
* Remove is_encoder_decoder from some vision models * cleanup more * cleanup more Co-authored-by: ydshieh <[email protected]>
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: ydshieh <[email protected]>
* biogpt initial commit * updated init * fix faster decoding with use_cache * 1. fix input_ids and input_embeds with correct device 2. added _keys_to_ignore_on_load_missing 3. updated prepare_inputs_for_generation * add activation_dropout and scale_embedding * replace fsmt attention with bart attention * added test * run make fix-copies * doc init and fix build * updated README with proper information * 1. added tips to docs 2. updated BioGptTokenizer func * 1. added tokenizer test 2. refactor tokenizer * make fixup * add biogpt fairseq to hf converter * updated layer names more similar to original checkpoints * config update doc string and set defaults * added "#copied" from bart model and updated doc strings * enable model_input_names in tokenizer * 1. positionalembedding depending on attention_mask 2. added attention mask to prepare for generation * added test to verify past and generation * BioGptLMHeadModel -> BioGptForCausalLM * fix typo * tokenization and test Copyright and updated assertion * updated Copyright and one func at time in line * Copyright updates and minor doc fix * replace assertion with ValueError * rm extra space * added code syntax * revert cmnt position change * add tokenizer to auto * updated doc string * tokenizer doc string update * biogpt hub model update to microsoft/biogpt * make fixup * rm cmnt to fix flake8 5.0.4 vs 6 error
* Expected output for the test changed * fix failing asr test
* add support for `from_pt` * add tf_flax utility file * Update src/transformers/modeling_tf_flax_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * remove flax related modifications * add test * remove FLAX related commits * fixup * remove safetensor todos * revert deletion Co-authored-by: Sylvain Gugger <[email protected]>
* Make convert_to_onnx runable as script again Fix `convert_graph_to_onnx.py` relative import so it can be run as a script again. * Trigger CI
* add type annotations for esm chunk_utils use isinstance builtin instead of 'type(x) is y'; add assertions to aid in type inferencing; use bools instead of ints in _get_minimal_slice_set for improved type clarity; refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm data_transforms refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm feats utils refactor to avoid re-assigning to the same variable with a different type * add type annotations for esm loss utils * add/fix type annotations for esm rigit_utils refactor to avoid re-assigning to the same variable with a different type; fix Callable, Tuple type hints; match conditional structure to other methods; fix return type on Rotation.cat and Rotation.unsqueeze * add type annotations for esm tensor_utils overload for tree_map; use insinstance builtin instead of 'type(x) is y'; export dict_multimap, flatten_final_dims, permute_final_dims in openfold_utils * add type annotations for esm protein utils add FIXME for attempted string mutation; add missing None check in get_pdb_headers; fix potentially unbound variable 'chain_tag' in to_pdb; modify get_pdb_headers return type * add type annotations for esm residue constants hints on collection constants; remove magic trailing comma to reduce number of lines; change list -> tuple for rigid_group_atom_positions for improved hinting * code style fixup Co-authored-by: Matt <[email protected]>
* rembert onnx config * formatting Co-authored-by: Ho <[email protected]>
* Fix link to table transformer detection microsoft model * Fix doc styles
Co-authored-by: ydshieh <[email protected]>
* Fix whisper and speech to text doc # What does this PR do? Previously the documentation was badly indented for both models and indicated that > If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value of `inputs_embeds`.` Which is on valid for the forward pass of the `ForConditionnalGeneration` not for the model alone. * other fixes
* remove set-output Co-authored-by: ydshieh <[email protected]>
* add v1 with tests * add checker * simplified version * update docstring * better version * fix docstring + change order * make style * tests + change conditions * final tests * modify docstring * Update src/transformers/feature_extraction_utils.py Co-authored-by: amyeroberts <[email protected]> * replace by `ValueError` * fix logic * apply suggestions * `dtype` is not needed * adapt suggestions * remove `_parse_args_to_device` Co-authored-by: amyeroberts <[email protected]>
* [Whisper] Fix decoder ids methods * enum property
* add whisper conversion scrip * update conversion script * update arg names * fix missing encoder_ffn_dim * fixup * ast nits
Oh, looks like something went wrong in your rebase (see the diff showing lots of files). You can either force-push a commit (with --force) to repare the history for git, or close this PR and open a fresh one. |
When running HuggingFace BERT (any size) fine-tuning tutorial with transformers version >= 4.21.0 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, I see NaNs in the loss after the first step.
What does this PR do?
This PR addresses the issue where the model code passes a value that is out of range for XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, so the conversion would cast it to -inf.
The NaNs likely come from the transformers library change: #17306 . This PR replaced many lines which used to be -float(inf) (or other small constants) with torch.finfo().min. For torch.float32 the min value is -3.4028234663852886e+38 which is smaller than the bfloat16 minimum of -3.3895313892515355e+38. So the problem is that torch.finfo(torch.float32).min = -3.4028234663852886e+38 gets converted to -inf. When the original encoder_extended_attention_mask is 1, then encoder_extended_attention_mask becomes (1.0 - 1.0 ) * -inf which becomes NaN (via IEEE rule Inf * 0.0 = NaN).
This PR ensures torch.finfo(torch.bfloat16).min = -3.3895313892515355e+38 and not -inf. Then the results would not have Nans.
The following lines checks for XLA_USE_BF16 or XLA_DOWNCAST_BF16 environment variable and sets the dtype accordingly:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sgugger