Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite TensorFlow train_step and test_step #17057

Merged
merged 8 commits into from
May 17, 2022
Merged

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented May 3, 2022

Draft PR for a full rewrite of the TF train/test steps. I swear this will fix like 50% of our TF issues in one PR.

Current status:

  • Correctly handles output mapping across most model classes for losses + metrics
  • Keras metrics are back, even with the dummy loss. (!!!!)
  • Keras metrics work correctly even for multi-output models (like QA)
  • In most cases, users can pass tensors in either the input dict or the labels and the model will handle them correctly.
  • No more errors when calling fit() when the model has nested output structure (e.g. the model outputting a past tuple)

What's left to do:

  • Models with multiple unusual outputs that do not match label names may still have issues with metrics. This is relatively uncommon. We support adding a property to those classes to tell Keras what to do with the labels, but we haven't added it to any models yet. (None are failing in tests, so hopefully we won't need to worry too much about this!)
  • Testing testing testing! I want to rerun all notebooks/examples and make sure the user experience is good.
  • CI testing - We need to make sure we don't regress on any of this
  • Discoverability: After this is merged we should update notebooks/examples to show off the cool new features, and document our TF workflow/philosophy somewhere that new users will find.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 3, 2022

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review May 3, 2022 13:54
@Rocketknight1
Copy link
Member Author

(Requesting reviews now that @gante is back)

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3 This is great, Keras users will definitely feel more at home

I've added two comments: a suggestion (for potentially more organized code) and a question. Other than that, LGTM!

Comment on lines 964 to 973
if self._label_to_output_map is not None:
label_to_output = self._label_to_output_map
elif "start_positions" in arg_names:
label_to_output = {"start_positions": "start_logits", "end_positions": "end_logits"}
elif "sentence_order_label" in arg_names:
label_to_output = {"labels": "prediction_logits", "sentence_order_label": "sop_logits"}
elif "next_sentence_label" in arg_names:
label_to_output = {"labels": "prediction_logits", "next_sentence_label": "seq_relationship_logits"}
elif "mc_labels" in arg_names:
label_to_output = {"labels": "logits", "mc_labels": "mc_logits"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for discussion: Could these hardcoded defaults be part of the TFXXXLoss (e.g.)? The actual models inherit from these classes (e.g.), and thus we could add them on a per-loss basis, as opposed to having a big if/else in the train/test steps :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored this when writing the PR! I think that would work in a lot of cases, but there are some models which have their own custom losses, and other models that define hf_compute_loss in the model class itself.

So I'm not sure if moving this to the Loss classes would be that easy, but for cleanliness, I can extract this to a method called something like infer_label_to_output_map() and just call that in train_step instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting to an external function sounds good 👍 (especially because it is reused between train and test)

Comment on lines 1005 to 1006
if len(y) == 1:
_, y = y.popitem()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converts y from a dictionary with one item to the value of that dictionary entry. Looking below, it seems like it should handle dicts correctly. What's happening here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way is to catch more cases, but I realize now I could have been a lot smarter about it. One sec!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! This code was added because the user often passes a dict where the key is "labels", which is not the name of any of the outputs. The correct thing to do for those models is to map the "labels" tensor to the first model output - I changed this line so that it checks the single key is called "labels" before doing so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it up a little more - now we try to map by key name before falling back to mapping to the first output as a last resort.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Rocketknight1 Rocketknight1 merged commit 349f1c8 into main May 17, 2022
@Rocketknight1 Rocketknight1 deleted the tf_train_step_rewrite branch May 17, 2022 13:36
ArthurZucker added a commit to ArthurZucker/transformers that referenced this pull request May 20, 2022
commit 5419205
Author: Patrick von Platen <[email protected]>
Date:   Thu May 19 23:46:26 2022 +0200

    [Test OPT] Add batch generation test opt (huggingface#17359)

    * up

    * up

commit 48c2269
Author: ddobokki <[email protected]>
Date:   Fri May 20 05:42:44 2022 +0900

    Fix bug in Wav2Vec2 pretrain example (huggingface#17326)

commit 5d6feec
Author: Nathan Dahlberg <[email protected]>
Date:   Thu May 19 16:21:19 2022 -0400

    fix for 17292 (huggingface#17293)

commit 518bd02
Author: Patrick von Platen <[email protected]>
Date:   Thu May 19 22:17:02 2022 +0200

    [Generation] Fix Transition probs (huggingface#17311)

    * [Draft] fix transition probs

    * up

    * up

    * up

    * make it work

    * fix

    * finish

    * update

commit e8714c0
Author: Patrick von Platen <[email protected]>
Date:   Thu May 19 22:15:36 2022 +0200

    [OPT] Run test in lower precision on GPU (huggingface#17353)

    * [OPT] Run test only in half precision

    * up

    * up

    * up

    * up

    * finish

    * fix on GPU

    * Update tests/models/opt/test_modeling_opt.py

commit 2b28229
Author: Nicolas Patry <[email protected]>
Date:   Thu May 19 20:28:12 2022 +0200

    Adding `batch_size` test to QA pipeline. (huggingface#17330)

commit a4386d7
Author: Nicolas Patry <[email protected]>
Date:   Thu May 19 10:29:16 2022 +0200

    [BC] Fixing usage of text pairs (huggingface#17324)

    * [BC] Fixing usage of text pairs

    The BC is actually preventing users from misusing the pipeline since
    users could have been willing to send text pairs and the pipeline would
    instead understand the thing as a batch returning bogus results.

    The correct usage of text pairs is preserved in this PR even when that
    makes the code clunky.

    Adds support for {"text":..,, "text_pair": ...} inputs for both dataset
    iteration and more explicit usage to pairs.

    * Updating the doc.

    * Update src/transformers/pipelines/text_classification.py

    Co-authored-by: Sylvain Gugger <[email protected]>

    * Update src/transformers/pipelines/text_classification.py

    Co-authored-by: Sylvain Gugger <[email protected]>

    * Update tests/pipelines/test_pipelines_text_classification.py

    Co-authored-by: Lysandre Debut <[email protected]>

    * quality.

    Co-authored-by: Sylvain Gugger <[email protected]>
    Co-authored-by: Lysandre Debut <[email protected]>

commit 3601aa8
Author: Stas Bekman <[email protected]>
Date:   Wed May 18 16:00:47 2022 -0700

    [tests] fix copy-n-paste error (huggingface#17312)

    * [tests] fix copy-n-paste error

    * fix

commit 1b20c97
Author: Yih-Dar <[email protected]>
Date:   Wed May 18 21:49:08 2022 +0200

    Fix ci_url might be None (huggingface#17332)

    * fix

    * Update utils/notification_service.py

    Co-authored-by: Lysandre Debut <[email protected]>

    Co-authored-by: ydshieh <[email protected]>
    Co-authored-by: Lysandre Debut <[email protected]>

commit 6aad387
Author: Yih-Dar <[email protected]>
Date:   Wed May 18 21:26:44 2022 +0200

    fix (huggingface#17337)

    Co-authored-by: ydshieh <[email protected]>

commit 1762ded
Author: Zachary Mueller <[email protected]>
Date:   Wed May 18 14:17:40 2022 -0400

    Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts (huggingface#17331)

    * Fix length in no_trainer examples

    * Add setup and teardown

    * Use new accelerator config generator to automatically make tests able to run based on environment

commit 6e195eb
Author: Jader Martins <[email protected]>
Date:   Wed May 18 14:18:43 2022 -0300

    docs for typical decoding (huggingface#17186)

    Co-authored-by: Jader Martins <[email protected]>

commit 060fe61
Author: Yih-Dar <[email protected]>
Date:   Wed May 18 19:07:48 2022 +0200

    Not send successful report (huggingface#17329)

    * send report only if there is any failure

    Co-authored-by: ydshieh <[email protected]>

commit b3b9f99
Author: Yih-Dar <[email protected]>
Date:   Wed May 18 17:57:23 2022 +0200

    Fix test_t5_decoder_model_past_large_inputs (huggingface#17320)

    Co-authored-by: ydshieh <[email protected]>

commit 6da76b9
Author: Jingya HUANG <[email protected]>
Date:   Wed May 18 17:52:13 2022 +0200

    Add onnx export cuda support (huggingface#17183)

    Co-authored-by: Lysandre Debut <[email protected]>

    Co-authored-by: lewtun <[email protected]>

commit adc0ff2
Author: NielsRogge <[email protected]>
Date:   Wed May 18 17:47:18 2022 +0200

    Add CvT (huggingface#17299)

    * Adding cvt files

    * Adding cvt files

    * changes in init file

    * Adding cvt files

    * changes in init file

    * Style fixes

    * Address comments from code review

    * Apply suggestions from code review

    Co-authored-by: Sylvain Gugger <[email protected]>

    * Format lists in docstring

    * Fix copies

    * Apply suggestion from code review

    Co-authored-by: AnugunjNaman <[email protected]>
    Co-authored-by: Ayushman Singh <[email protected]>
    Co-authored-by: Niels Rogge <[email protected]>
    Co-authored-by: Sylvain Gugger <[email protected]>

commit 4710702
Author: Sylvain Gugger <[email protected]>
Date:   Wed May 18 10:46:40 2022 -0400

    Fix style

commit 5fdb54e
Author: mraunak <[email protected]>
Date:   Wed May 18 10:39:02 2022 -0400

    Add Information Gain Filtration algorithm (huggingface#16953)

    * Add information gain filtration algorithm

    * Complying with black requirements

    * Added author

    * Fixed import order

    * flake8 corrections

    Co-authored-by: Javier Turek <[email protected]>

commit 91ede48
Author: Kamal Raj <[email protected]>
Date:   Wed May 18 19:59:53 2022 +0530

    Fix typo (huggingface#17328)

commit fe28eb9
Author: Yih-Dar <[email protected]>
Date:   Wed May 18 16:06:41 2022 +0200

    remove (huggingface#17325)

    Co-authored-by: ydshieh <[email protected]>

commit 2cb2ea3
Author: Nicolas Patry <[email protected]>
Date:   Wed May 18 16:06:24 2022 +0200

    Accepting real pytorch device as arguments. (huggingface#17318)

    * Accepting real pytorch device as arguments.

    * is_torch_available.

commit 1c9d1f4
Author: Nicolas Patry <[email protected]>
Date:   Wed May 18 15:46:12 2022 +0200

    Updating the docs for `max_seq_len` in QA pipeline (huggingface#17316)

commit 60ad734
Author: Patrick von Platen <[email protected]>
Date:   Wed May 18 15:08:56 2022 +0200

    [T5] Fix init in TF and Flax for pretraining (huggingface#17294)

    * fix init

    * Apply suggestions from code review

    * fix

    * finish

    * Update src/transformers/modeling_tf_utils.py

    Co-authored-by: Sylvain Gugger <[email protected]>

    Co-authored-by: Sylvain Gugger <[email protected]>

commit 7ba1d4e
Author: Joaq <[email protected]>
Date:   Wed May 18 09:23:47 2022 -0300

    Add type hints for ProphetNet (Pytorch) (huggingface#17223)

    * added type hints to prophetnet

    * reformatted with black

    * fix bc black misformatted some parts

    * fix imports

    * fix imports

    * Update src/transformers/models/prophetnet/configuration_prophetnet.py

    Co-authored-by: Matt <[email protected]>

    * update OPTIONAL type hint and docstring

    Co-authored-by: Matt <[email protected]>

commit d6b8e9c
Author: Carl <[email protected]>
Date:   Wed May 18 01:07:43 2022 +0200

    Add trajectory transformer (huggingface#17141)

    * Add trajectory transformer

    Fix model init

    Fix end of lines for .mdx files

    Add trajectory transformer model to toctree

    Add forward input docs

    Fix docs, remove prints, simplify prediction test

    Apply suggestions from code review

    Co-authored-by: Sylvain Gugger <[email protected]>
    Apply suggestions from code review

    Co-authored-by: Lysandre Debut <[email protected]>
    Co-authored-by: Sylvain Gugger <[email protected]>
    Update docs, more descriptive comments

    Apply suggestions from code review

    Co-authored-by: Sylvain Gugger <[email protected]>
    Update readme

    Small comment update and add conversion script

    Rebase and reformat

    Fix copies

    Fix rebase, remove duplicates

    Fix rebase, remove duplicates

    * Remove tapex

    * Remove tapex

    * Remove tapex

commit c352640
Author: Patrick von Platen <[email protected]>
Date:   Wed May 18 00:34:31 2022 +0200

    fix (huggingface#17310)

commit d9050dc
Author: Cesare Campagnano <[email protected]>
Date:   Tue May 17 23:44:37 2022 +0200

    [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (huggingface#17112)

    * [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing

    * LED docs clarification

    Co-authored-by: Patrick von Platen <[email protected]>

    * [LED] gradient_checkpointing=True should be passed to TrainingArguments

    Co-authored-by: Patrick von Platen <[email protected]>

    * [LED] docs: remove wrong word

    Co-authored-by: Patrick von Platen <[email protected]>

    * [LED] docs fix typo

    Co-authored-by: Patrick von Platen <[email protected]>

    Co-authored-by: Patrick von Platen <[email protected]>

commit bad3583
Author: Jean Vancoppenolle <[email protected]>
Date:   Tue May 17 23:42:14 2022 +0200

    Add support for pretraining recurring span selection to Splinter (huggingface#17247)

    * Add SplinterForSpanSelection for pre-training recurring span selection.

    * Formatting.

    * Rename SplinterForSpanSelection to SplinterForPreTraining.

    * Ensure repo consistency

    * Fixup changes

    * Address SplinterForPreTraining PR comments

    * Incorporate feedback and derive multiple question tokens per example.

    * Update src/transformers/models/splinter/modeling_splinter.py

    Co-authored-by: Patrick von Platen <[email protected]>

    * Update src/transformers/models/splinter/modeling_splinter.py

    Co-authored-by: Patrick von Platen <[email protected]>

    Co-authored-by: Jean Vancoppenole <[email protected]>
    Co-authored-by: Tobias Günther <[email protected]>
    Co-authored-by: Tobias Günther <[email protected]>
    Co-authored-by: Patrick von Platen <[email protected]>

commit 0511305
Author: Yih-Dar <[email protected]>
Date:   Tue May 17 18:56:58 2022 +0200

    Add PR author in CI report + merged by info (huggingface#17298)

    * Add author info to CI report

    * Add merged by info

    * update

    Co-authored-by: ydshieh <[email protected]>

commit 032d63b
Author: Sylvain Gugger <[email protected]>
Date:   Tue May 17 12:56:24 2022 -0400

    Fix dummy creation script (huggingface#17304)

commit 986dd5c
Author: Sylvain Gugger <[email protected]>
Date:   Tue May 17 12:50:14 2022 -0400

    Fix style

commit 38ddab1
Author: Karim Foda <[email protected]>
Date:   Tue May 17 09:32:12 2022 -0700

    Doctest longformer (huggingface#16441)

    * Add initial doctring changes

    * make fixup

    * Add TF doc changes

    * fix seq classifier output

    * fix quality errors

    * t

    * swithc head to random init

    * Fix expected outputs

    * Update src/transformers/models/longformer/modeling_longformer.py

    Co-authored-by: Yih-Dar <[email protected]>

    Co-authored-by: Yih-Dar <[email protected]>

commit 10704e1
Author: Patrick von Platen <[email protected]>
Date:   Tue May 17 18:20:36 2022 +0200

    [Test] Fix W2V-Conformer integration test (huggingface#17303)

    * [Test] Fix W2V-Conformer integration test

    * correct w2v2

    * up

commit 28a0811
Author: regisss <[email protected]>
Date:   Tue May 17 17:58:14 2022 +0200

    Improve mismatched sizes management when loading a pretrained model (huggingface#17257)

    - Add --ignore_mismatched_sizes argument to classification examples

    - Expand the error message when loading a model whose head dimensions are different from expected dimensions

commit 1f13ba8
Author: Patrick von Platen <[email protected]>
Date:   Tue May 17 15:48:23 2022 +0200

    correct opt (huggingface#17301)

commit 349f1c8
Author: Matt <[email protected]>
Date:   Tue May 17 14:36:23 2022 +0100

    Rewrite TensorFlow train_step and test_step (huggingface#17057)

    * Initial commit

    * Better label renaming

    * Remove breakpoint before pushing (this is your job)

    * Test a lot more in the Keras fit() test

    * make fixup

    * Clarify the case where we flatten y dicts into tensors

    * Clarify the case where we flatten y dicts into tensors

    * Extract label name remapping to a method

commit 651e48e
Author: Matt <[email protected]>
Date:   Tue May 17 14:14:17 2022 +0100

    Fix tests of mixed precision now that experimental is deprecated (huggingface#17300)

    * Fix tests of mixed precision now that experimental is deprecated

    * Fix mixed precision in training_args_tf.py too

commit 6d21142
Author: SaulLu <[email protected]>
Date:   Tue May 17 14:33:13 2022 +0200

    fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)
Narsil pushed a commit to Narsil/transformers that referenced this pull request May 30, 2022
fix tokenizer autodoc

fix minor CI issues

fix minor CI issues

fix minor CI issues

fix style issue

fix minor import issues

fix few issues

remove def main on the test

add require torch

replace decorator with 'with'

fix style

change to bloom

add quick fix tokenizer

fix tokenizer file

fix tokenizer

- merge tests
- small fixes

fix import issue

add bloom to readme

fix consistency

Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <[email protected]>

Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <[email protected]>

fix doc issue

small fix - modeling test

some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

remove useless division

more tests should pass

more tests should pass

more tests should pass

let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (huggingface#17194)

* Update data2vec.mdx

* Update data2vec.mdx

* Update docs/source/en/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: Sylvain Gugger <[email protected]>

Dev version

Add test to ensure models can take int64 inputs (huggingface#17210)

* Add test to ensure models can take int64 inputs

* is_integer is an attribute, not a method

* Fix test when some inputs aren't tensors

* Add casts to blenderbot and blenderbot-small

* Add casts to the other failing models

Fix dependency table

update BART docs (huggingface#17212)

Black preview (huggingface#17217)

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

Fix typo in bug report template (huggingface#17178)

* Fix typo

* Force rerun workflows

Co-authored-by: Felix Marty <[email protected]>

Added translation of installation.mdx to Portuguese Issue huggingface#16824 (huggingface#16979)

* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py

* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.

* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.

[ pipeline_tutorial.mdx ] - Grammar changes.

* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.

* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.

[ training.mdx ] - Added portuguese translation for training tutorial.

* [ preprocessing.mdx ] - WIP

* Update _toctree.yml

* Adding Pré-processamento to _toctree.yml

* Update accelerate.mdx

* Nits and eliminate preprocessing file while it is ready

Co-authored-by: Omar U. Espejel <[email protected]>

OPT-fix (huggingface#17229)

* try fixes

* Revert "try fixes"

This reverts commit a8ad75e.

* add correct shape

* add correct path

OPT - fix docstring and improve tests slighly (huggingface#17228)

* correct some stuff

* fix doc tests

* make style

Update self-push workflow (huggingface#17177)

* update push ci

* install git-python

* update comment

* update deepspeed jobs

* fix report

* skip 2 more tests that require fairscale

* Fix changes in test_fetcher.py (to deal with `setup.py` is changed)

* set RUN_PT_TF_CROSS_TESTS=1 and final clean-up

* remove SIGOPT_API_TOKEN

* remove echo "$matrix_folders"

Co-authored-by: ydshieh <[email protected]>

fix --gpus option for docker (huggingface#17235)

Co-authored-by: ydshieh <[email protected]>

Handle copyright in add-new-model-like (huggingface#17218)

Fix Trainer for Datasets that don't have dict items (huggingface#17239)

install dev. version of accelerate (huggingface#17243)

Co-authored-by: ydshieh <[email protected]>

Fix push CI channel (huggingface#17242)

Co-authored-by: ydshieh <[email protected]>

Add PR title to push CI report (huggingface#17246)

* add PR title to push CI report

* add link

Co-authored-by: ydshieh <[email protected]>

[ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (huggingface#17076)

* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial

* Delete docs/source/pt-br directory

* [ fast_tokenizers.mdx ] - Continuing work on file

* [ fast_tokenizers.mdx ] - Continuing work on file

* Add fast tokenizers to _toctree.yml

* Eliminated config and toctree.yml

* Nits in fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <[email protected]>

Translated version of model_sharing.mdx doc to spanish (huggingface#16184)

* Translated version of model_sharing to spanish

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Addind model sharing to _toctree.yml

Co-authored-by: Omar U. Espejel <[email protected]>

Guide to create custom models in Spanish (huggingface#17158)

* file copied and toctree updated

* Intro and configuration translated

* model section translated

* enter hotfix

* Translation over, correction pending

* Typos and corrections

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <[email protected]>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <[email protected]>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <[email protected]>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <[email protected]>

Co-authored-by: Omar U. Espejel <[email protected]>

Fix obvious typos in flax decoder impl (huggingface#17279)

Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder.

TF - Fix convnext classification example (huggingface#17261)

[WIP] [doc] performance/scalability revamp (huggingface#15723)

* [doc] performance/scalability revamp

* link the new docs

* no :

* mixed precision

* work on the first doc

* expand the main doc

* Trigger CI

* style

* revamp single GPU training section

* work on training performance

* remove files not used anymore or will be added later

* final touches

* fix rebase

* Add hardware section to toctree

* fix toctree again

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* remove `fast_tokenizers` entry that was copied in rebase

* add warning about DP vs DDP

* remove todo

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* fix missing closure of codeblock

* Update docs/source/en/perf_train_gpu_many.mdx

Co-authored-by: Sylvain Gugger <[email protected]>

* sync with huggingface#16860

* update toc

Co-authored-by: leandro <[email protected]>
Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

fixed bug in run_mlm_flax_stream.py (huggingface#17203)

* fixed bug run_mlm_flax_stream.py

Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output.

* Update run_mlm_flax_stream.py

* adding missing paranthesis

* formatted to black

* remove cols from dataset instead

* reformat to black

* moved rem. columns to map

* formatted to black

Co-authored-by: KennethEnevoldsen <[email protected]>

 Updated checkpoint support for Sagemaker Model Parallel (huggingface#17219)

* adding partial checkpoint support for optimizer state

* formatted trainer.py

* Refactoring based on comments

* reformatting

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <[email protected]>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: Cavdar <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

Update codeparrot data preprocessing (huggingface#16944)

* add new preprocessing arguments

* add new filters

* add new filters to readme

* fix config and test count, update function names and docstrings

* reformat code

* update readme

* Update readme

* rename config_test filter

Co-authored-by: Leandro von Werra <[email protected]>

* rename few_assignments filter

Co-authored-by: Leandro von Werra <[email protected]>

* rename tokenizer in arguments

Co-authored-by: Leandro von Werra <[email protected]>

* rename functions and add limit_line argument for config_test filter

* update threshold for config_test filter

Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Loubna ben allal <[email protected]>

CodeParrot data pretokenization (huggingface#16932)

* add pretokenization arguments

* add pretokenization script

* add support for pretokenized data

* reformat code

* fix run command for training

* fix model call from config

* remove a package

* add comments on pretokenization in the readme

* remove explicit parallelization

Co-authored-by: Leandro von Werra <[email protected]>

* update readme

Co-authored-by: Leandro von Werra <[email protected]>

* update readme -remove username

Co-authored-by: Leandro von Werra <[email protected]>

* update readme -remove username

Co-authored-by: Leandro von Werra <[email protected]>

* keep data parallelization

* reformat code

* reformat code

* update readme

* reformat code

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <[email protected]>

Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Loubna ben allal <[email protected]>

Remove next sentence prediction from supported ONNX tasks (huggingface#17276)

Align logits and labels in OPT (huggingface#17237)

Mlflowcallback fix nonetype error (huggingface#17171)

* Fix edge cases TypeError: 'NoneType' object is not callable

* fix style

Automatically sort auto mappings (huggingface#17250)

* Automatically sort auto mappings

* Better class extraction

* Some auto class magic

* Adapt test and underlying behavior

* Remove re-used config

* Quality

Make TrainerHyperParameterSigOptIntegrationTest slow test (huggingface#17288)

Co-authored-by: ydshieh <[email protected]>

Better error in the Auto API when a dep is missing (huggingface#17289)

Fix FlavaForPreTrainingIntegrationTest CI test (huggingface#17232)

Co-authored-by: ydshieh <[email protected]>

Use the PR URL in CI report (huggingface#17269)

Co-authored-by: ydshieh <[email protected]>

logging documentation update (huggingface#17174)

* logging documentation

* style

Co-authored-by: Sander Land <[email protected]>

docs(transformers): fix typo (huggingface#17263)

Add Tensorflow Swin model (huggingface#16988)

Co-authored-by: Matt <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

[Tests] Fix slow opt tests (huggingface#17282)

* fix opt tests

* remove unused tok

* make style

* make flake8 happy

* Update tests/models/opt/test_modeling_opt.py

Fix test_model_parallelization (huggingface#17249)

* Fix test_model_parallelization

* Modify

Add Wav2Vec2Conformer (huggingface#16812)

* save intermediate

* add wav2vec2 conformer

* add more code

* more

* first test passes

* make all checkpoints work

* update

* up

* more clean ups

* save clean-up

* save clean-up

* save more

* remove bogus

* finalize design conformer

* remove vision

* finish all tests

* more changes

* finish code

* add doc tests

* add slow tests

* fix autoconfig test

* up

* correct docstring

* up

* update

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Anton Lozhkov <[email protected]>

* Update docs/source/en/model_doc/wav2vec2-conformer.mdx

* upload

* save copied from

* correct configs

* fix model outputs

* add to docs

* fix imports

* finish

* finish code

* correct copied from

* correct again

* correct make fix

* improve make fix copies

* save

* correct fix copy from

* correct init structure

* correct

* fix import

* apply suggestions

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Anton Lozhkov <[email protected]>

Fix missing job action button in CI report  (huggingface#17270)

* use matrix.machine_type

* fix job names used in job_link

Co-authored-by: ydshieh <[email protected]>

Fix wrong PT/TF categories in CI report (huggingface#17272)

Co-authored-by: ydshieh <[email protected]>

[ConvNeXT] Fix drop_path_rate (huggingface#17280)

* Fix drop_path_rate

* Fix TF's drop path rate

fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)

Fix tests of mixed precision now that experimental is deprecated (huggingface#17300)

* Fix tests of mixed precision now that experimental is deprecated

* Fix mixed precision in training_args_tf.py too

Rewrite TensorFlow train_step and test_step (huggingface#17057)

* Initial commit

* Better label renaming

* Remove breakpoint before pushing (this is your job)

* Test a lot more in the Keras fit() test

* make fixup

* Clarify the case where we flatten y dicts into tensors

* Clarify the case where we flatten y dicts into tensors

* Extract label name remapping to a method

correct opt (huggingface#17301)

refactor

- refactor code
- style changes
- add new threshold for test

major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

modify readme

small fixes

small fix

- better threshold for a test

remove old test file from fetcher

fix small typo

major change

- change BloomLMHead to BloomForCausalLM

remove onnx config

major changes

- refactor the code
- remove asserts
- change tol for test

make style

small change

adding a slow test + commenting old ones for now

make style

Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

make style

fix duplicates

cleaning comments on config

clean a bit conversion file

refacor a bit modeling file

refactor tokenizer file

fix tokenization test issue

fix tokenization issue second try

fix tokenization issue #2

fix test issue

make style + add suggestions

change test fetcher

try this one

- slow tests should pass
- finger crossed

possible final changes

make style

try fix padding side issue

fix side

fix padding issue

fix ko-readme

fix config auto

cleaning modeling file

keep bloom in caps in ko

update config docs

remove pretraining_pp

remove model parallel

update config

- add correct config files

fix duplicates

fix fetcher

fix refactor issue

- remove divide function

try to remove alibi

small fixes

- fix alibi
- remove seq length
- refactor a bit the code

put correct values

- fix bos and eos token ids

fix attention mask loop

Co-authored-by: thomasw21 <[email protected]>

small fixes:

- remove skip bias add

small fixes

- fix typo in readme
- fix typos in config

small changes

- remove a test
- add reconstruction test
- change config

small changes

- change Scaled Softmax to BloomScaledSoftmax

small fixes

- fix alibi dtype

major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

fix readmes

major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

refactor a bit

refactor a bit

put correct name on test

change docstring

small changes

- fix docstring modeling
- fix test tolerance

fix small nit

- take dtype from tensors in the conversion script

minor fix

- fix mdx issue

minor fix

- change config docstring

forward contrib credits from PR14084

Apply suggestions from code review

Co-authored-by: Stas Bekman <[email protected]>

apply modifications

Co-authored-by: Stas Bekman <[email protected]>

resolve softmax upcast

Apply suggestions from code review

Co-authored-by: Stas Bekman <[email protected]>

Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <[email protected]>

final changes modeling

Co-authored-by: Stas Bekman <[email protected]>

Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

merge commit

Apply suggestions from code review

Co-authored-by: Stas Bekman <[email protected]>

apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <[email protected]>
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Initial commit

* Better label renaming

* Remove breakpoint before pushing (this is your job)

* Test a lot more in the Keras fit() test

* make fixup

* Clarify the case where we flatten y dicts into tensors

* Clarify the case where we flatten y dicts into tensors

* Extract label name remapping to a method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants