-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite TensorFlow train_step and test_step #17057
Conversation
The documentation is not available anymore as the PR was closed or merged. |
57a8ab0
to
b7db255
Compare
(Requesting reviews now that @gante is back) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 This is great, Keras users will definitely feel more at home
I've added two comments: a suggestion (for potentially more organized code) and a question. Other than that, LGTM!
if self._label_to_output_map is not None: | ||
label_to_output = self._label_to_output_map | ||
elif "start_positions" in arg_names: | ||
label_to_output = {"start_positions": "start_logits", "end_positions": "end_logits"} | ||
elif "sentence_order_label" in arg_names: | ||
label_to_output = {"labels": "prediction_logits", "sentence_order_label": "sop_logits"} | ||
elif "next_sentence_label" in arg_names: | ||
label_to_output = {"labels": "prediction_logits", "next_sentence_label": "seq_relationship_logits"} | ||
elif "mc_labels" in arg_names: | ||
label_to_output = {"labels": "logits", "mc_labels": "mc_logits"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored this when writing the PR! I think that would work in a lot of cases, but there are some models which have their own custom losses, and other models that define hf_compute_loss
in the model class itself.
So I'm not sure if moving this to the Loss
classes would be that easy, but for cleanliness, I can extract this to a method called something like infer_label_to_output_map()
and just call that in train_step
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracting to an external function sounds good 👍 (especially because it is reused between train and test)
if len(y) == 1: | ||
_, y = y.popitem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts y
from a dictionary with one item to the value of that dictionary entry. Looking below, it seems like it should handle dicts correctly. What's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did it this way is to catch more cases, but I realize now I could have been a lot smarter about it. One sec!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! This code was added because the user often passes a dict where the key is "labels", which is not the name of any of the outputs. The correct thing to do for those models is to map the "labels" tensor to the first model output - I changed this line so that it checks the single key is called "labels" before doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it up a little more - now we try to map by key name before falling back to mapping to the first output as a last resort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
commit 5419205 Author: Patrick von Platen <[email protected]> Date: Thu May 19 23:46:26 2022 +0200 [Test OPT] Add batch generation test opt (huggingface#17359) * up * up commit 48c2269 Author: ddobokki <[email protected]> Date: Fri May 20 05:42:44 2022 +0900 Fix bug in Wav2Vec2 pretrain example (huggingface#17326) commit 5d6feec Author: Nathan Dahlberg <[email protected]> Date: Thu May 19 16:21:19 2022 -0400 fix for 17292 (huggingface#17293) commit 518bd02 Author: Patrick von Platen <[email protected]> Date: Thu May 19 22:17:02 2022 +0200 [Generation] Fix Transition probs (huggingface#17311) * [Draft] fix transition probs * up * up * up * make it work * fix * finish * update commit e8714c0 Author: Patrick von Platen <[email protected]> Date: Thu May 19 22:15:36 2022 +0200 [OPT] Run test in lower precision on GPU (huggingface#17353) * [OPT] Run test only in half precision * up * up * up * up * finish * fix on GPU * Update tests/models/opt/test_modeling_opt.py commit 2b28229 Author: Nicolas Patry <[email protected]> Date: Thu May 19 20:28:12 2022 +0200 Adding `batch_size` test to QA pipeline. (huggingface#17330) commit a4386d7 Author: Nicolas Patry <[email protected]> Date: Thu May 19 10:29:16 2022 +0200 [BC] Fixing usage of text pairs (huggingface#17324) * [BC] Fixing usage of text pairs The BC is actually preventing users from misusing the pipeline since users could have been willing to send text pairs and the pipeline would instead understand the thing as a batch returning bogus results. The correct usage of text pairs is preserved in this PR even when that makes the code clunky. Adds support for {"text":..,, "text_pair": ...} inputs for both dataset iteration and more explicit usage to pairs. * Updating the doc. * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/pipelines/test_pipelines_text_classification.py Co-authored-by: Lysandre Debut <[email protected]> * quality. Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> commit 3601aa8 Author: Stas Bekman <[email protected]> Date: Wed May 18 16:00:47 2022 -0700 [tests] fix copy-n-paste error (huggingface#17312) * [tests] fix copy-n-paste error * fix commit 1b20c97 Author: Yih-Dar <[email protected]> Date: Wed May 18 21:49:08 2022 +0200 Fix ci_url might be None (huggingface#17332) * fix * Update utils/notification_service.py Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> commit 6aad387 Author: Yih-Dar <[email protected]> Date: Wed May 18 21:26:44 2022 +0200 fix (huggingface#17337) Co-authored-by: ydshieh <[email protected]> commit 1762ded Author: Zachary Mueller <[email protected]> Date: Wed May 18 14:17:40 2022 -0400 Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts (huggingface#17331) * Fix length in no_trainer examples * Add setup and teardown * Use new accelerator config generator to automatically make tests able to run based on environment commit 6e195eb Author: Jader Martins <[email protected]> Date: Wed May 18 14:18:43 2022 -0300 docs for typical decoding (huggingface#17186) Co-authored-by: Jader Martins <[email protected]> commit 060fe61 Author: Yih-Dar <[email protected]> Date: Wed May 18 19:07:48 2022 +0200 Not send successful report (huggingface#17329) * send report only if there is any failure Co-authored-by: ydshieh <[email protected]> commit b3b9f99 Author: Yih-Dar <[email protected]> Date: Wed May 18 17:57:23 2022 +0200 Fix test_t5_decoder_model_past_large_inputs (huggingface#17320) Co-authored-by: ydshieh <[email protected]> commit 6da76b9 Author: Jingya HUANG <[email protected]> Date: Wed May 18 17:52:13 2022 +0200 Add onnx export cuda support (huggingface#17183) Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: lewtun <[email protected]> commit adc0ff2 Author: NielsRogge <[email protected]> Date: Wed May 18 17:47:18 2022 +0200 Add CvT (huggingface#17299) * Adding cvt files * Adding cvt files * changes in init file * Adding cvt files * changes in init file * Style fixes * Address comments from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Format lists in docstring * Fix copies * Apply suggestion from code review Co-authored-by: AnugunjNaman <[email protected]> Co-authored-by: Ayushman Singh <[email protected]> Co-authored-by: Niels Rogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> commit 4710702 Author: Sylvain Gugger <[email protected]> Date: Wed May 18 10:46:40 2022 -0400 Fix style commit 5fdb54e Author: mraunak <[email protected]> Date: Wed May 18 10:39:02 2022 -0400 Add Information Gain Filtration algorithm (huggingface#16953) * Add information gain filtration algorithm * Complying with black requirements * Added author * Fixed import order * flake8 corrections Co-authored-by: Javier Turek <[email protected]> commit 91ede48 Author: Kamal Raj <[email protected]> Date: Wed May 18 19:59:53 2022 +0530 Fix typo (huggingface#17328) commit fe28eb9 Author: Yih-Dar <[email protected]> Date: Wed May 18 16:06:41 2022 +0200 remove (huggingface#17325) Co-authored-by: ydshieh <[email protected]> commit 2cb2ea3 Author: Nicolas Patry <[email protected]> Date: Wed May 18 16:06:24 2022 +0200 Accepting real pytorch device as arguments. (huggingface#17318) * Accepting real pytorch device as arguments. * is_torch_available. commit 1c9d1f4 Author: Nicolas Patry <[email protected]> Date: Wed May 18 15:46:12 2022 +0200 Updating the docs for `max_seq_len` in QA pipeline (huggingface#17316) commit 60ad734 Author: Patrick von Platen <[email protected]> Date: Wed May 18 15:08:56 2022 +0200 [T5] Fix init in TF and Flax for pretraining (huggingface#17294) * fix init * Apply suggestions from code review * fix * finish * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> commit 7ba1d4e Author: Joaq <[email protected]> Date: Wed May 18 09:23:47 2022 -0300 Add type hints for ProphetNet (Pytorch) (huggingface#17223) * added type hints to prophetnet * reformatted with black * fix bc black misformatted some parts * fix imports * fix imports * Update src/transformers/models/prophetnet/configuration_prophetnet.py Co-authored-by: Matt <[email protected]> * update OPTIONAL type hint and docstring Co-authored-by: Matt <[email protected]> commit d6b8e9c Author: Carl <[email protected]> Date: Wed May 18 01:07:43 2022 +0200 Add trajectory transformer (huggingface#17141) * Add trajectory transformer Fix model init Fix end of lines for .mdx files Add trajectory transformer model to toctree Add forward input docs Fix docs, remove prints, simplify prediction test Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Update docs, more descriptive comments Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Update readme Small comment update and add conversion script Rebase and reformat Fix copies Fix rebase, remove duplicates Fix rebase, remove duplicates * Remove tapex * Remove tapex * Remove tapex commit c352640 Author: Patrick von Platen <[email protected]> Date: Wed May 18 00:34:31 2022 +0200 fix (huggingface#17310) commit d9050dc Author: Cesare Campagnano <[email protected]> Date: Tue May 17 23:44:37 2022 +0200 [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (huggingface#17112) * [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing * LED docs clarification Co-authored-by: Patrick von Platen <[email protected]> * [LED] gradient_checkpointing=True should be passed to TrainingArguments Co-authored-by: Patrick von Platen <[email protected]> * [LED] docs: remove wrong word Co-authored-by: Patrick von Platen <[email protected]> * [LED] docs fix typo Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> commit bad3583 Author: Jean Vancoppenolle <[email protected]> Date: Tue May 17 23:42:14 2022 +0200 Add support for pretraining recurring span selection to Splinter (huggingface#17247) * Add SplinterForSpanSelection for pre-training recurring span selection. * Formatting. * Rename SplinterForSpanSelection to SplinterForPreTraining. * Ensure repo consistency * Fixup changes * Address SplinterForPreTraining PR comments * Incorporate feedback and derive multiple question tokens per example. * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Jean Vancoppenole <[email protected]> Co-authored-by: Tobias Günther <[email protected]> Co-authored-by: Tobias Günther <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> commit 0511305 Author: Yih-Dar <[email protected]> Date: Tue May 17 18:56:58 2022 +0200 Add PR author in CI report + merged by info (huggingface#17298) * Add author info to CI report * Add merged by info * update Co-authored-by: ydshieh <[email protected]> commit 032d63b Author: Sylvain Gugger <[email protected]> Date: Tue May 17 12:56:24 2022 -0400 Fix dummy creation script (huggingface#17304) commit 986dd5c Author: Sylvain Gugger <[email protected]> Date: Tue May 17 12:50:14 2022 -0400 Fix style commit 38ddab1 Author: Karim Foda <[email protected]> Date: Tue May 17 09:32:12 2022 -0700 Doctest longformer (huggingface#16441) * Add initial doctring changes * make fixup * Add TF doc changes * fix seq classifier output * fix quality errors * t * swithc head to random init * Fix expected outputs * Update src/transformers/models/longformer/modeling_longformer.py Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: Yih-Dar <[email protected]> commit 10704e1 Author: Patrick von Platen <[email protected]> Date: Tue May 17 18:20:36 2022 +0200 [Test] Fix W2V-Conformer integration test (huggingface#17303) * [Test] Fix W2V-Conformer integration test * correct w2v2 * up commit 28a0811 Author: regisss <[email protected]> Date: Tue May 17 17:58:14 2022 +0200 Improve mismatched sizes management when loading a pretrained model (huggingface#17257) - Add --ignore_mismatched_sizes argument to classification examples - Expand the error message when loading a model whose head dimensions are different from expected dimensions commit 1f13ba8 Author: Patrick von Platen <[email protected]> Date: Tue May 17 15:48:23 2022 +0200 correct opt (huggingface#17301) commit 349f1c8 Author: Matt <[email protected]> Date: Tue May 17 14:36:23 2022 +0100 Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method commit 651e48e Author: Matt <[email protected]> Date: Tue May 17 14:14:17 2022 +0100 Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too commit 6d21142 Author: SaulLu <[email protected]> Date: Tue May 17 14:33:13 2022 +0200 fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)
fix tokenizer autodoc fix minor CI issues fix minor CI issues fix minor CI issues fix style issue fix minor import issues fix few issues remove def main on the test add require torch replace decorator with 'with' fix style change to bloom add quick fix tokenizer fix tokenizer file fix tokenizer - merge tests - small fixes fix import issue add bloom to readme fix consistency Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <[email protected]> Apply suggestions from code review fix comment issues on file headers Co-authored-by: Sylvain Gugger <[email protected]> fix doc issue small fix - modeling test some changes - refactor some code - taking into account reviews - more tests should pass - removed pruning tests remove useless division more tests should pass more tests should pass more tests should pass let's try this one -add alibi offset - remove all permutes to make the grad operations work - finger crossed Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (huggingface#17194) * Update data2vec.mdx * Update data2vec.mdx * Update docs/source/en/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Dev version Add test to ensure models can take int64 inputs (huggingface#17210) * Add test to ensure models can take int64 inputs * is_integer is an attribute, not a method * Fix test when some inputs aren't tensors * Add casts to blenderbot and blenderbot-small * Add casts to the other failing models Fix dependency table update BART docs (huggingface#17212) Black preview (huggingface#17217) * Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black Fix typo in bug report template (huggingface#17178) * Fix typo * Force rerun workflows Co-authored-by: Felix Marty <[email protected]> Added translation of installation.mdx to Portuguese Issue huggingface#16824 (huggingface#16979) * Added translation of installation.mdx to Portuguese, as well as default templates of _toctree.yml and _config.py * [ build_documentation.yml ] - Updated doc_builder to build documentation in Portuguese. [ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx. * [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder. [ pipeline_tutorial.mdx ] - Grammar changes. * [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial. * [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial. [ training.mdx ] - Added portuguese translation for training tutorial. * [ preprocessing.mdx ] - WIP * Update _toctree.yml * Adding Pré-processamento to _toctree.yml * Update accelerate.mdx * Nits and eliminate preprocessing file while it is ready Co-authored-by: Omar U. Espejel <[email protected]> OPT-fix (huggingface#17229) * try fixes * Revert "try fixes" This reverts commit a8ad75e. * add correct shape * add correct path OPT - fix docstring and improve tests slighly (huggingface#17228) * correct some stuff * fix doc tests * make style Update self-push workflow (huggingface#17177) * update push ci * install git-python * update comment * update deepspeed jobs * fix report * skip 2 more tests that require fairscale * Fix changes in test_fetcher.py (to deal with `setup.py` is changed) * set RUN_PT_TF_CROSS_TESTS=1 and final clean-up * remove SIGOPT_API_TOKEN * remove echo "$matrix_folders" Co-authored-by: ydshieh <[email protected]> fix --gpus option for docker (huggingface#17235) Co-authored-by: ydshieh <[email protected]> Handle copyright in add-new-model-like (huggingface#17218) Fix Trainer for Datasets that don't have dict items (huggingface#17239) install dev. version of accelerate (huggingface#17243) Co-authored-by: ydshieh <[email protected]> Fix push CI channel (huggingface#17242) Co-authored-by: ydshieh <[email protected]> Add PR title to push CI report (huggingface#17246) * add PR title to push CI report * add link Co-authored-by: ydshieh <[email protected]> [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (huggingface#17076) * [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial * Delete docs/source/pt-br directory * [ fast_tokenizers.mdx ] - Continuing work on file * [ fast_tokenizers.mdx ] - Continuing work on file * Add fast tokenizers to _toctree.yml * Eliminated config and toctree.yml * Nits in fast_tokenizers.mdx Co-authored-by: Omar U. Espejel <[email protected]> Translated version of model_sharing.mdx doc to spanish (huggingface#16184) * Translated version of model_sharing to spanish * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Addind model sharing to _toctree.yml Co-authored-by: Omar U. Espejel <[email protected]> Guide to create custom models in Spanish (huggingface#17158) * file copied and toctree updated * Intro and configuration translated * model section translated * enter hotfix * Translation over, correction pending * Typos and corrections * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <[email protected]> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <[email protected]> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <[email protected]> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <[email protected]> Co-authored-by: Omar U. Espejel <[email protected]> Fix obvious typos in flax decoder impl (huggingface#17279) Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder. TF - Fix convnext classification example (huggingface#17261) [WIP] [doc] performance/scalability revamp (huggingface#15723) * [doc] performance/scalability revamp * link the new docs * no : * mixed precision * work on the first doc * expand the main doc * Trigger CI * style * revamp single GPU training section * work on training performance * remove files not used anymore or will be added later * final touches * fix rebase * Add hardware section to toctree * fix toctree again * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * remove `fast_tokenizers` entry that was copied in rebase * add warning about DP vs DDP * remove todo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * fix missing closure of codeblock * Update docs/source/en/perf_train_gpu_many.mdx Co-authored-by: Sylvain Gugger <[email protected]> * sync with huggingface#16860 * update toc Co-authored-by: leandro <[email protected]> Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> fixed bug in run_mlm_flax_stream.py (huggingface#17203) * fixed bug run_mlm_flax_stream.py Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output. * Update run_mlm_flax_stream.py * adding missing paranthesis * formatted to black * remove cols from dataset instead * reformat to black * moved rem. columns to map * formatted to black Co-authored-by: KennethEnevoldsen <[email protected]> Updated checkpoint support for Sagemaker Model Parallel (huggingface#17219) * adding partial checkpoint support for optimizer state * formatted trainer.py * Refactoring based on comments * reformatting * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Cavdar <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Update codeparrot data preprocessing (huggingface#16944) * add new preprocessing arguments * add new filters * add new filters to readme * fix config and test count, update function names and docstrings * reformat code * update readme * Update readme * rename config_test filter Co-authored-by: Leandro von Werra <[email protected]> * rename few_assignments filter Co-authored-by: Leandro von Werra <[email protected]> * rename tokenizer in arguments Co-authored-by: Leandro von Werra <[email protected]> * rename functions and add limit_line argument for config_test filter * update threshold for config_test filter Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Loubna ben allal <[email protected]> CodeParrot data pretokenization (huggingface#16932) * add pretokenization arguments * add pretokenization script * add support for pretokenized data * reformat code * fix run command for training * fix model call from config * remove a package * add comments on pretokenization in the readme * remove explicit parallelization Co-authored-by: Leandro von Werra <[email protected]> * update readme Co-authored-by: Leandro von Werra <[email protected]> * update readme -remove username Co-authored-by: Leandro von Werra <[email protected]> * update readme -remove username Co-authored-by: Leandro von Werra <[email protected]> * keep data parallelization * reformat code * reformat code * update readme * reformat code * Update examples/research_projects/codeparrot/README.md Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Loubna ben allal <[email protected]> Remove next sentence prediction from supported ONNX tasks (huggingface#17276) Align logits and labels in OPT (huggingface#17237) Mlflowcallback fix nonetype error (huggingface#17171) * Fix edge cases TypeError: 'NoneType' object is not callable * fix style Automatically sort auto mappings (huggingface#17250) * Automatically sort auto mappings * Better class extraction * Some auto class magic * Adapt test and underlying behavior * Remove re-used config * Quality Make TrainerHyperParameterSigOptIntegrationTest slow test (huggingface#17288) Co-authored-by: ydshieh <[email protected]> Better error in the Auto API when a dep is missing (huggingface#17289) Fix FlavaForPreTrainingIntegrationTest CI test (huggingface#17232) Co-authored-by: ydshieh <[email protected]> Use the PR URL in CI report (huggingface#17269) Co-authored-by: ydshieh <[email protected]> logging documentation update (huggingface#17174) * logging documentation * style Co-authored-by: Sander Land <[email protected]> docs(transformers): fix typo (huggingface#17263) Add Tensorflow Swin model (huggingface#16988) Co-authored-by: Matt <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> [Tests] Fix slow opt tests (huggingface#17282) * fix opt tests * remove unused tok * make style * make flake8 happy * Update tests/models/opt/test_modeling_opt.py Fix test_model_parallelization (huggingface#17249) * Fix test_model_parallelization * Modify Add Wav2Vec2Conformer (huggingface#16812) * save intermediate * add wav2vec2 conformer * add more code * more * first test passes * make all checkpoints work * update * up * more clean ups * save clean-up * save clean-up * save more * remove bogus * finalize design conformer * remove vision * finish all tests * more changes * finish code * add doc tests * add slow tests * fix autoconfig test * up * correct docstring * up * update * fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Anton Lozhkov <[email protected]> * Update docs/source/en/model_doc/wav2vec2-conformer.mdx * upload * save copied from * correct configs * fix model outputs * add to docs * fix imports * finish * finish code * correct copied from * correct again * correct make fix * improve make fix copies * save * correct fix copy from * correct init structure * correct * fix import * apply suggestions Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Anton Lozhkov <[email protected]> Fix missing job action button in CI report (huggingface#17270) * use matrix.machine_type * fix job names used in job_link Co-authored-by: ydshieh <[email protected]> Fix wrong PT/TF categories in CI report (huggingface#17272) Co-authored-by: ydshieh <[email protected]> [ConvNeXT] Fix drop_path_rate (huggingface#17280) * Fix drop_path_rate * Fix TF's drop path rate fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231) Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method correct opt (huggingface#17301) refactor - refactor code - style changes - add new threshold for test major changes - change BLOOM to Bloom - add quick doc on bloom.mdx - move embeddings test on modeling test modify readme small fixes small fix - better threshold for a test remove old test file from fetcher fix small typo major change - change BloomLMHead to BloomForCausalLM remove onnx config major changes - refactor the code - remove asserts - change tol for test make style small change adding a slow test + commenting old ones for now make style Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> make style fix duplicates cleaning comments on config clean a bit conversion file refacor a bit modeling file refactor tokenizer file fix tokenization test issue fix tokenization issue second try fix tokenization issue #2 fix test issue make style + add suggestions change test fetcher try this one - slow tests should pass - finger crossed possible final changes make style try fix padding side issue fix side fix padding issue fix ko-readme fix config auto cleaning modeling file keep bloom in caps in ko update config docs remove pretraining_pp remove model parallel update config - add correct config files fix duplicates fix fetcher fix refactor issue - remove divide function try to remove alibi small fixes - fix alibi - remove seq length - refactor a bit the code put correct values - fix bos and eos token ids fix attention mask loop Co-authored-by: thomasw21 <[email protected]> small fixes: - remove skip bias add small fixes - fix typo in readme - fix typos in config small changes - remove a test - add reconstruction test - change config small changes - change Scaled Softmax to BloomScaledSoftmax small fixes - fix alibi dtype major changes - removing explicit dtype when loading modules - fixing test args (torch_dtype=auto) - add dosctring fix readmes major changes - now bloom supports alibi shifting - refactor a bit the code - better test tolerance now refactor a bit refactor a bit put correct name on test change docstring small changes - fix docstring modeling - fix test tolerance fix small nit - take dtype from tensors in the conversion script minor fix - fix mdx issue minor fix - change config docstring forward contrib credits from PR14084 Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> apply modifications Co-authored-by: Stas Bekman <[email protected]> resolve softmax upcast Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Niklas Muennighoff <[email protected]> final changes modeling Co-authored-by: Stas Bekman <[email protected]> Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2' merge commit Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> apply suggestions Apply suggestions from Stas comments Co-authored-by: Stas Bekman <[email protected]>
* Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method
Draft PR for a full rewrite of the TF train/test steps. I swear this will fix like 50% of our TF issues in one PR.
Current status:
fit()
when the model has nested output structure (e.g. the model outputting apast
tuple)What's left to do: