Layoutlmv2 onnx #2

* Fix circular import in onnx.utils * Add comment for test fetcher * Here too * Style

* Add examples telemetry * Alternative approach * Add to all other examples * Add to templates as well * Put framework separately * Same for TensorFlow

* Support for deberta and deberta-v2 * Support for LXMert * Support for Hubert * Fix for pt1.11 * Trigger CI

* Quicktour Portuguese Translation Translated quicktour.mdx until line 161 * Finished translating quicktour.mdx Ready to upload and adjust eventual .mdx or translation mistakes. * Add _toctree.yml and fix nits * Fixed pt-br mdx syntax problem Closed <frameworkcontent> instance * Changed </frameworkcontent> line * Copied missing block from english version of quicktour.mdx * Reviwed the entire file once again. It should be working now. Co-authored-by: Omar U. Espejel <[email protected]>

* added cbs to notebooks, made copy-paste error fix in generation_utils * initial push for mctc model * mctc feature extractor done * added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly. * added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly. * passing attention, now struggling to figure out how attention masks make sense here * works when excluding attention masks. ask later how one would integrate attention maskshere * bizarre configuration error (model prefix comes first in config dict json and messes up the order) * all passing but bizzarre config dict ordering issue when to_dict * passing all major tests * feature extraction, processor, tokenizer added & tests passing * style & consistency & other logistical fixes * copy paste fix * model after feature extraction working * commiting final feature extraction results; need to fix normalization * feature extraction passing tests; probably should add tests on the specific flashlight-copied functions? * delete print ; format code a bit * fixing tests * passing major tests * fixing styles * completed tokenization test with real example; not sure if these values are entirely correct. * last test fixes from local * reverting accidentally included custom setup configs * remove load tf weights; fix config error * testing couldnt import featureextractor * fix docs * fix docs * resolving comments * style fixes * style fixes * Update to MCTCConv1dSubSampler Co-authored-by: Patrick von Platen <[email protected]> * relposemb fixes * conv1d name issue; expecting config fail with paraentheses * fix config issue * fix config issue * fix config issue * change everything to MCTCT * fixing naming change errors * archive list * copyrights and docs * copyrights and docs * copyrights and docs * merge resolution * move tests, fix to changed optionaldependency structure * test directories changed * fixing tests * how to avoid tf tests? * how to avoid tf tests? * tests passing locally * allow mctctprocessor imported any env * allow mctctprocessor imported any env * fixed second round of feedback, need to fix docs * doc changes not being applied * all fixed * style fix * feedback fixes * fix copies and feature extraction style fix * Update tests/models/visual_bert/test_modeling_visual_bert.py Co-authored-by: Sylvain Gugger <[email protected]> * copy paste huggingface:main visual bert * added eof newline to visual bert; all tests are passing otherwise * fix slow tests by adding attention mask * change model id to speechbrain * make fix-copies * fix readme unwanted deletes * fixing readmes, make fix-copies * consistent M-CTC-T naming * Update src/transformers/models/mctct/__init__.py Co-authored-by: Patrick von Platen <[email protected]> * all fixed but variable naming * adjust double quotes * fixed variable names * copyright and mr quilter * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * correct slow tests * make fix-copies * Update src/transformers/models/mctct/configuration_mctct.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mctct/configuration_mctct.py Co-authored-by: Sylvain Gugger <[email protected]> * m-ctc-t not mctct Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* Stricter pt-to-tf checks; Update docker image for related tests * check all attributes in the output Co-authored-by: Sylvain Gugger <[email protected]>

* feat: initial implementation of data2vec segmentation model in TF. * chore: minor corrections to make the segmenter work. * chore: removed unncessary files. * chore: add tests and other modifications. * fix: loss computation for segmentation. * chore: remove unused variable. * chore: formatting. * added a dummy adaptive pooling layer. * removed unnecessary file. * potentially add identifiers to layer names. * fix: layer naming. * chore: removed unnecessary print. * Skipping unneeded test * chore: add logging to debug tolerance. * fix: segmentation tests for tfdata2vecvision * chore: make style. * fix: layer names, assertion to be resolved. * Bumping test tolerance a bit * chore: bump the tol in PT test. Co-authored-by: matt <[email protected]>

* Update docker file Co-authored-by: ydshieh <[email protected]>

…uggingface#17549)

…el Extension for PyTorch (huggingface#17138) * init PR * fix import ipex * minor fix on bf16 * refine optimizer * refine args notes * refine code * refine ipex optimize args * refine half_precision_backend * black format * isort format * isort format files * flake8 format * doc builder format * refine codes * remove jit and optim bits * black preview format * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * refine code * refine notes * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * code refine * add ipex ut * add performance cpu doc * link to the cpu doc from main perf doc * install ipex into CI's docker * Update perf_train_cpu.mdx * Update docs/source/en/perf_train_cpu.mdx Co-authored-by: Stas Bekman <[email protected]> * Update perf_train_cpu.mdx * Update perf_train_cpu.mdx Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Stas Bekman <[email protected]>

* Fix link for community notebooks This fixes the link for community notebooks due to reorganization. * Replace old link with fully link to the doc page Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…assed (huggingface#17593) * Merge PT and TF behavior

…face#17495)

* adding template * update model * model update * update conf for debug model * update conversion * update conversion script * update conversion script * fix missing keys check * add tests to test the tokenizer in the local machine * Change variable name * add tests on xnli dataset * add more description * add descriptions + clearer code * clearer code * adding new tests + skipping few tests because of env problems * change comment * add dtype on the configuration * add test embeddings * add hardcoded test * fix dtype issue * adding torch.float16 to config * adding more metrics (min, max, mean) * add sum * now the test passes with almost equal * add files for conversion - test passes on cpu gpu * add final changes * cleaning code * add new args in the docstring * fix one liner function * remove macros * remove forward attention * clean up init funtion * add comments on the issue * rm scale mask softmax * do make style * fix dtype in init * fixing for loop on att probs * fix style with black * fix style + doc error * fix and debug CI errors (docs + style) * some updates - change new operations - finally add scaled softmax - added new args in the config * make use cache working * add changes - save sharded models - final changes on the modeling script * add changes - comment on alibi - add TODO on seq length * test commit - added a text to test the commit Co-authored-by: thomasw21 <[email protected]> * final changes - attention mask change - generation works on BS176b Co-authored-by: thomasw21 <[email protected]> * changes - model + conversion * move to correct dir * put , * fex fixes * fix tokenizer autodoc * fix minor CI issues * fix minor CI issues * fix minor CI issues * fix style issue * fix minor import issues * fix few issues * remove def main on the test * add require torch * replace decorator with 'with' * fix style * change to bloom * add quick fix tokenizer * fix tokenizer file * fix tokenizer - merge tests - small fixes * fix import issue * add bloom to readme * fix consistency * Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Apply suggestions from code review fix comment issues on file headers Co-authored-by: Sylvain Gugger <[email protected]> * fix doc issue * small fix - modeling test * some changes - refactor some code - taking into account reviews - more tests should pass - removed pruning tests * remove useless division * more tests should pass * more tests should pass * more tests should pass * let's try this one -add alibi offset - remove all permutes to make the grad operations work - finger crossed * refactor - refactor code - style changes - add new threshold for test * major changes - change BLOOM to Bloom - add quick doc on bloom.mdx - move embeddings test on modeling test * modify readme * small fixes * small fix - better threshold for a test * remove old test file from fetcher * fix small typo * major change - change BloomLMHead to BloomForCausalLM * remove onnx config * major changes - refactor the code - remove asserts - change tol for test * make style * small change * adding a slow test + commenting old ones for now * make style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * make style * fix duplicates * cleaning comments on config * clean a bit conversion file * refacor a bit modeling file * refactor tokenizer file * fix tokenization test issue * fix tokenization issue #2 * fix tokenization issue second try * fix test issue * make style + add suggestions * change test fetcher * try this one - slow tests should pass - finger crossed * possible final changes * make style * try fix padding side issue * fix side * fix padding issue * fix ko-readme * fix config auto * cleaning modeling file * keep bloom in caps in ko * update config docs * remove pretraining_pp * remove model parallel * update config - add correct config files * fix duplicates * fix fetcher * fix refactor issue - remove divide function * try to remove alibi * small fixes - fix alibi - remove seq length - refactor a bit the code * put correct values - fix bos and eos token ids * fix attention mask loop Co-authored-by: thomasw21 <[email protected]> * small fixes: - remove skip bias add * small fixes - fix typo in readme - fix typos in config * small changes - remove a test - add reconstruction test - change config * small changes - change Scaled Softmax to BloomScaledSoftmax * small fixes - fix alibi dtype * major changes - removing explicit dtype when loading modules - fixing test args (torch_dtype=auto) - add dosctring * fix readmes * major changes - now bloom supports alibi shifting - refactor a bit the code - better test tolerance now * refactor a bit * refactor a bit * put correct name on test * change docstring * small changes - fix docstring modeling - fix test tolerance * fix small nit - take dtype from tensors in the conversion script * minor fix - fix mdx issue * minor fix - change config docstring * forward contrib credits from PR14084 * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * apply modifications Co-authored-by: Stas Bekman <[email protected]> * resolve softmax upcast * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Niklas Muennighoff <[email protected]> * final changes modeling Co-authored-by: Stas Bekman <[email protected]> * Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2' * merge commit * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * apply suggestions Apply suggestions from Stas comments Co-authored-by: Stas Bekman <[email protected]> * Fix gradient checkpointing Co-authored-by: Stas Bekman <[email protected]> * add slow but exact * add accelerate compatibility Co-authored-by: Nicolas Patry <[email protected]> * forward contrib credits Co-authored-by: thomasw21 <[email protected]> Co-authored-by: sgugger <[email protected]> Co-authored-by: patrickvonplaten <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: LysandreJik <[email protected]> * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * fix torch device on tests * make style * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * fix nits Co-authored-by: patrickvonplaten<[email protected]> * remove final nits * fix doc - add more details on the doc - add links to checkpoints * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Sylvain Gugger <[email protected]> * apply suggestions Co-authored-by: sgugger <[email protected]> * put test torchscript to false * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: justheuristic <[email protected]> * fix alibi - create alibi only once * add small doc * make quality * replace torch.nn * remove token type emb * fix fused op + output bias * add fused op - now can control fused operation from config * remove fused op * make quality * small changes - remove unsed args on config - removed bias gelu file - make the model torchscriptable - add torchscript slow tests * Update src/transformers/models/bloom/modeling_bloom.py * fix slow * make style * add accelerate support * add bloom to deepspeed tests * minor changes * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * minor change * slow tests pass * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <[email protected]> * minor changes: - change docstring - add link to paper Co-authored-by: Thomwolf <[email protected]> Co-authored-by: Thomas Wolf <[email protected]> Co-authored-by: thomasw21 <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: sIncerass <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: thomasw21 <[email protected]> Co-authored-by: sgugger <[email protected]> Co-authored-by: patrickvonplaten <[email protected]> Co-authored-by: LysandreJik <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: justheuristic <[email protected]> Co-authored-by: Stas Bekman <[email protected]>

* Add ONNX support for ResNet * Add ONNX test * make fix-copies

* Use shape_list to safely get shapes * Add relevant test * Tidy and add metrics * Resolve dynamic shaping issues and move test * Tidy up and all samples in batch * Formatting

…ce#17606) * Adding `top_k` and `sort` arguments to `text-classification` pipeline. - Deprecate `return_all_scores` as `top_k` is more uniform with other pipelines, and a superset of what `return_all_scores` can do. BC is maintained though. `return_all_scores=True` -> `top_k=None` `return_all_scores=False` -> `top_k=1` - Using `top_k` will imply sorting the results, but using no argument will keep the results unsorted for backward compatibility. * Remove `sort`. * Fixing the test. * Remove bad doc.

* Fix very long job failure text in Slack report Co-authored-by: ydshieh <[email protected]>

When we're preparing the tensors for CPU for postprocessing, we need to upgrade the `float16` to `float32` since CPUs don't have instructions for `[b]float16`.

…7614) * [modeling_utils] torch_dtype/auto fixes * add test * apply suggestions * add missing fallback * Renaming things * Use for else Co-authored-by: Sylvain Gugger <[email protected]>

* pre-build deepspeed Co-authored-by: ydshieh <[email protected]>

* convert assertion to raised exception in debertav2 * change assert to raise exception in deberta * fix messages

* Add Italian translation for autoclass_tutorial.mdx * Fix synthesis Co-authored-by: martina.fumanelli <[email protected]>

* move clip image utils to image_utils.py * dont default to square images * fix typo, revert change to test file * edit convert_rgb comments

* enable crop_center method to handle (W, H, C) images * minor style and comment edits

…huggingface#17645) Bumps [cookiecutter](https://github.com/cookiecutter/cookiecutter) from 1.7.2 to 2.1.1. - [Release notes](https://github.com/cookiecutter/cookiecutter/releases) - [Changelog](https://github.com/cookiecutter/cookiecutter/blob/master/HISTORY.md) - [Commits](cookiecutter/cookiecutter@1.7.2...2.1.1) --- updated-dependencies: - dependency-name: cookiecutter dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

VisibleDeprecationWarning is addressed by specifying dtype=object when creating numpy array. Update code based on review feedback. Undo whitespace changes to tokenization_utils_base.py. Co-authored-by: I like data <[email protected]>

…ace#17651) * Raise RepoNotFoundError in case of 401 * Include changes from revert-17646-skip_repo_not_found * Add a comment * 💄 Code quality * 💚 Update `get_from_cache` test * 💚 Code quality & skip failing test

- use CodeParrot scores of v1.1 - change evaluation command to use accelerate

* [BigBirdFlaxTests] Make tests slow * up * correct black with new version

Co-authored-by: ydshieh <[email protected]>

* wip * rebase * all tests pass * rebase * ready for PR * address comments * fix styles * add require_torch to pipeline test * remove remote image to improve CI consistency * address comments; fix tf/flax tests * address comments; fix tf/flax tests * fix tests; add alias * repo consistency tests * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <[email protected]> * address comments * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <[email protected]> * merge * Update src/transformers/models/auto/modeling_auto.py Co-authored-by: Sylvain Gugger <[email protected]> * merge Co-authored-by: Sijun He <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…eval_strategy (huggingface#17669) Co-authored-by: Saint <[email protected]>

* Fix dtype getters * Proper fix for dtype getter * Style and commant * Always use last for consistency * Quality

I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`. If this is incorrect, please feel free to just close the PR. Thanks!

* allow scope from trainer arg * add ray_scope to training args * escape double quotes * make style && quality * attempt to solve doc style issues * splitting up URLs for style * make fixup * Update src/transformers/training_args.py Co-authored-by: Antoni Baum <[email protected]> * make style Co-authored-by: Antoni Baum <[email protected]>

* enable cpu distribution training using mpirun *command like * mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx *MASTER_ADDR and MASTER_PORT should be set as env *export MASTER_ADDR=127.0.0.1 *export MASTER_PORT=29500 Signed-off-by: Wang, Yi A <[email protected]> * fix according to the review comment Signed-off-by: Wang, Yi A <[email protected]> * use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment Signed-off-by: Wang, Yi A <[email protected]>

* Add FP16 supporot for sagemaker model parallel * minor fix * fix indentation * handle mix precision exception for smmp * minor fix * remove amp implementation on SMMP * remove redundant stuff * reformat trainer * restyling * reformat

* Initial commit * Make some fixes * Make PT model full forward pass * Drop TF & Flax implementation, fix copies etc * Add Flax model and update some corresponding stuff * Drop some TF things * Update config and flax local attn * Add encoder_attention_type to config * . * Update docs * Do some cleansing * Fix some issues -> make style; add some docs * Fix position_bias + mask addition + Update tests * Fix repo consistency * Fix model consistency by removing flax operation over attn_mask * [WIP] Add PT TGlobal LongT5 * . * [WIP] Add flax tglobal model * [WIP] Update flax model to use the right attention type in the encoder * Fix flax tglobal model forward pass * Make the use of global_relative_attention_bias * Add test suites for TGlobal model * Fix minor bugs, clean code * Fix pt-flax equivalence though not convinced with correctness * Fix LocalAttn implementation to match the original impl. + update READMEs * Few updates * Update: [Flax] improve large model init and loading huggingface#16148 * Add ckpt conversion script accoring to huggingface#16853 + handle torch device placement * Minor updates to conversion script. * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM * gpu support + dtype fix * Apply some suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> * * Remove (de)parallelize stuff * Edit shape comments * Update README.md * make fix-copies * Remove caching logic for local & tglobal attention * Apply another batch of suggestions from code review * Add missing checkpoints * Format converting scripts * Drop (de)parallelize links from longT5 mdx * Fix converting script + revert config file change * Revert "Remove caching logic for local & tglobal attention" This reverts commit 2a61982. * Stash caching logic in Flax model * Make side relative bias used always * Drop caching logic in PT model * Return side bias as it was * Drop all remaining model parallel logic * Remove clamp statements * Move test files to the proper place * Update docs with new version of hf-doc-builder * Fix test imports * Make some minor improvements * Add missing checkpoints to docs * Make TGlobal model compatible with torch.onnx.export * Replace some np.ndarray with jnp.ndarray * Fix TGlobal for ONNX conversion + update docs * fix _make_global_fixed_block_ids and masked neg value * update flax model * style and quality * fix imports * remove load_tf_weights_in_longt5 from init and fix copies * add slow test for TGlobal model * typo fix * Drop obsolete is_parallelizable and one warning * Update __init__ files to fix repo-consistency * fix pipeline test * Fix some device placements * [wip]: Update tests -- need to generate summaries to update expected_summary * Fix quality * Update LongT5 model card * Update (slow) summarization tests * make style * rename checkpoitns * finish * fix flax tests Co-authored-by: phungvanduy <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: patil-suraj <[email protected]>

* Fix doc builder Dockerfile Co-authored-by: ydshieh <[email protected]>

…nference (huggingface#17153) * add jit mode option and model wrap * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <[email protected]> * refine code * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> * add ut and refine code * code refine * refine code * add inference doc * Update src/transformers/trainer.py Co-authored-by: Stas Bekman <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Stas Bekman <[email protected]> * add cpu inference performance doc * Update perf_infer_cpu.mdx * Update perf_infer_cpu.mdx * Update performance.mdx * Update _toctree.yml * refine jit func naming * Update _toctree.yml * Delete perf_infer_gpu_one.mdx * Update perf_infer_cpu.mdx * Update docs/source/en/perf_infer_cpu.mdx Co-authored-by: Stas Bekman <[email protected]> * add none check before jit * Update docs/source/en/perf_infer_cpu.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Update docs/source/en/perf_infer_cpu.mdx Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Stas Bekman <[email protected]>

* check * update the RAG-end2end with new PL and RAY * removed unwanted comments

* Add note on amy's contribution. Co-authored-by: Amy Roberts <[email protected]> * remove non-tech comment. Co-authored by: Amy Roberts <[email protected]> Co-authored-by: Amy Roberts <[email protected]>

* Swin models call TFSwinMainLayer * Tidy up

…` classes (huggingface#17639) * add new bloom classes * (feat) add bloom classification tests; make style * style: change import in test * add some typehints to bloom classes * merge main into branch * fix: input checking in bloom seq classification * fix tests * change model class tests * fix few tests - more tests should pass - one test left * make token classifier return hidden states * style: make BLOOM typehints consistent Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

* Function refactor * Update src/transformers/utils/fx.py Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* rembert: fix python codeblock * rembert: use correct google/rembert checkpoint name in documentation * rembert: use correct google/rembert checkpoint name in TF documentation

* [Wav2Vec2Conformer] Official release * remove from not-in-readme

…" (huggingface#17717) This reverts commit b76290f.

* Add flag to push weights directly into main

VIsualBert uses bert-base-uncased tokenizer, therefore, instead of {mask}, the mask token should be [MASK]

* fix the naming * from pt in test for now * make style * slow test and removed from_pt

* Refine BF16 check in CPU/GPU * Fixes * Renames

* nightly build pytorch CI * fix working dir * change time and event name Co-authored-by: ydshieh <[email protected]>

…#17742) Bumps [notebook](http://jupyter.org) from 6.4.10 to 6.4.12. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [notebook](http://jupyter.org) from 6.4.10 to 6.4.12. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Migrate HFDeepSpeedConfig from trfrs to accelerate * add `accelerate` to testing dep * addressing comments * addressing comments Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error. * resolving comments 1. Use simple API from accelerate to manage the deepspeed config integration 2. Update the related documentation * reverting changes and addressing comments * docstring correction * addressing nits * addressing nits * addressing nits 3 * bumping up the accelerate version to 0.10.0 * resolving import * update setup.py to include deepspeed dependencies * Update dependency_versions_table.py * fixing imports * reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR. * removing `accelerate` as hard dependency Resolves issues related to CI Tests * adding `accelerate` as dependency for building docs resolves failure in Build PR Documentation test * adding `accelerate` as dependency in "dev" to resolve doc build issue * resolving comments 1. adding `accelerate` to extras["all"] 2. Including check for accelerate too before import HFDeepSpeedConfig from there Co-Authored-By: Sylvain Gugger <[email protected]> * resolving comments Co-authored-by: Sylvain Gugger <[email protected]>

…ce#17686) * Fix eval to compute rouge correctly for rouge_score * styling * moving sentence tokenization to utils from run_eval * saving ckpt in mlflow * use existing format of args * fix documentation Co-authored-by: Swetha Mandava <[email protected]>

…uggingface#17565) * Added translation of installation.mdx to Portuguese, as well as default templates of _toctree.yml and _config.py * [ build_documentation.yml ] - Updated doc_builder to build documentation in Portuguese. [ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx. * [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder. [ pipeline_tutorial.mdx ] - Grammar changes. * [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial. * [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial. [ training.mdx ] - Added portuguese translation for training tutorial. * [ preprocessing.mdx ] - WIP * Update _toctree.yml * Adding Pré-processamento to _toctree.yml * Update accelerate.mdx * Nits and eliminate preprocessing file while it is ready * [ index.mdx ] - Translated to Portuguese the index apresentation page. * [ docs/source/pt ] - Updated _toctree.yml to match newest translations. * Fix build_pr_documentation.yml * Fix index nits * nits in _toctree Co-authored-by: Omar U. Espejel <[email protected]>

* Use workflow_run event for push CI * change to workflow_run * Add comments Co-authored-by: ydshieh <[email protected]>

* Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus

* deprecate is_torch_bf16_available * address suggestions

* Fix cache for GPT-Neo-X * Add more tests

* Use torch.finfo(self.dtype).min * for GPTNeoX * for Albert * For Splinter * Update src/transformers/models/data2vec/modeling_data2vec_audio.py Co-authored-by: Patrick von Platen <[email protected]> * fix -inf used in Bart-like models * Fix a few remaining -inf * more fix * clean up * For CLIP * For FSMT * clean up * fix test * Add dtype argument and use it for LayoutLMv3 * update FlaxLongT5Attention Co-authored-by: ydshieh <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

On line 180, `torch.tensor(-1.0, xxx)` gives the error "TypeError: 'float' object cannot be interpreted as an integer" This is because the dtype here is `int64`. For `dtype=int64`, this needs to simply be `-1`. This impacts the long-t5-tglogbal-x model. It does not impact the long-t5-local-x version which does not appear to call this line.

* Add UL2 Co-authored-by: Daniel Hesslow <[email protected]> * Correct naming * sort better * up * apply sylvains suggestion

* add onnx support for debertav2 * debertav2 -> deberta-v2 in onnx features file * remove causal lm * add deberta-v2-xlarge to onnx tests * use self.type().dtype() in xsoftmax Co-authored-by: Jingya HUANG <[email protected]> * remove hack for deberta * remove unused imports * Update src/transformers/models/deberta_v2/configuration_deberta_v2.py Co-authored-by: Jingya HUANG <[email protected]> * use generate dummy inputs * linter * add imports * add support for deberta v1 as well * deberta does not support multiple choice * Update src/transformers/models/deberta/configuration_deberta.py Co-authored-by: Jingya HUANG <[email protected]> * Update src/transformers/models/deberta_v2/configuration_deberta_v2.py Co-authored-by: Jingya HUANG <[email protected]> * one line ordered dict * fire build Co-authored-by: Jingya HUANG <[email protected]>

…17054) * deduplication draft * update style * update style test * dummy test main * rename modules * rename functions * return extremes in deduplicate_clusters * update style * cast str for gzip * update doc string * time processing * use dataset map to compute minhash * fill value for short token * remove da map method * update style * use share object to multiprocess * update style * use f-string and minor fix Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Loubna Ben Allal <[email protected]> * update style * use module parameters * change ds_dedup to ds_filter * save ds_dedup * mv test to script tests * make jaccard threshold a parameter of deduplicate_dataset * update style * add doc strings * update style * add doc string for DuplicationIndex * save files into data dir * update readme * Update examples/research_projects/codeparrot/README.md Co-authored-by: Loubna Ben Allal <[email protected]> * make near deduplication optional * move near deduplication in README * Update examples/research_projects/codeparrot/README.md Co-authored-by: Leandro von Werra <[email protected]> * use f string Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Loubna Ben Allal <[email protected]>

Co-authored-by: Niels Rogge <[email protected]>

* Fix docstrings and variable names * Rename x to something better * Improve messages * Fix docstrings and add test for greyscale images Co-authored-by: Niels Rogge <[email protected]>

* added use_backbone_pretrained * style fixes * update * Update detr.mdx * Update detr.mdx * Update detr.mdx * update using doc py * Update detr.mdx * Update src/transformers/models/detr/configuration_detr.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…17716) * Prepare CI for v0.8.0 * pin hfh (revert before merge) * Revert "pin hfh (revert before merge)" This reverts commit a010314. * Test rc3 * Test latest rc * Unpin to the RC Co-authored-by: Sylvain Gugger <[email protected]>

* rename to check_pt_flax_outputs * update check_pt_flax_outputs * use 5e-5 for BigBird PT/Flax test Co-authored-by: ydshieh <[email protected]>

* initial commit * update modeeling tf utils * quality * clean and update args * update * remove potential bug * code quality * update * update max shard * update tests for sharding from pretrained * fix remaining test * make style * h5py if tf available * update and fix test * fix test * style * modified push to hub to support shard for TF * quick fix * update code * merge branch main and style * Apply suggestions from code review Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> * update based on reviews * update doc * update and style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Update based on reviews * fix typo * style Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

* Add final_layer_norm to OPT model * Add JAX and TF version * Fix Keras name * Woops * Allow for non breaking change * Apply suggestions from code review * add tests Co-authored-by: Patrick von Platen <[email protected]>

* Improve error message Union not allowed * make style * Update src/transformers/hf_argparser.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…7744) - Fix `top_k_top_p_filtering` not passing `filter_value` to `TopPLogitsWarper` causing any top-p filtered logits to be -inf instead of specified value - Add corresponding test

…r` methods `evaluate` and `predict` (huggingface#17805) * Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict` * Add all generate parameters to `Seq2SeqTrainer`, and also to `QuestionAnsweringSeq2SeqTrainer` which overrides it * Remove `self._num_beams` from trainer classes * - Run fixup - Fix "Constraint" not exposed - Fix synced_gpus to actually read from param * Use kwargs * Copy kwargs before making changes to it * Fix style issues unused imports

) Bumps [numpy](https://github.com/numpy/numpy) from 1.21.0 to 1.22.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst) - [Commits](numpy/numpy@v1.21.0...v1.22.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

huggingface#17817) Bumps [numpy](https://github.com/numpy/numpy) from 1.21.0 to 1.22.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst) - [Commits](numpy/numpy@v1.21.0...v1.22.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* use create_commit * better commit message and description * touch setup.py to trigger cache update * add hub version gating

* Offload fixes * Add a test

Co-authored-by: ydshieh <[email protected]>

* add: check labels for detr object detection doctests * add: check shapes * add: add detr to documentation_tests.py * fix: make fixup output * fix: add a comment

* Update modeling_yoso.py * make fixup * Update modeling_yoso.py That should be it copied from previous PR

* Feat: add missing type hints for QDQBertModel * fix: ran black and isort * feat: Add missing output type for QDQBertModel * feat: Add type hints for QDQBertLMHeadModel and models starting with QDQBertFor * fix: add missing return type for QDQBertModel * fix: remove wrong return type for QDQBertEmbeddings * fix: readded config argument to load_tf_weights_in_qdqbert * fix: add BertConfig type to BertEmbeddings config due t checko error in ci * fix: removed config type hints to avoid copy checks

…ce#17797) Co-authored-by: SaulLu <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* add skeleton files * fix cpu inference link * add hint to make clear that single gpu section contains general info * add new files to ToC * update toctree to have subsection for performance * add "coming soon" to the still empty sections * fix missing title * fix typo * add reference to empty documents * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Stas Bekman <[email protected]>

* few fixes: - hardcode tokenizer padding side - remove unused args * few fixes: - added new attribute on TokenizerTesterMixin - added new slow test - remove unused arg on tokenizer class * make style * Update src/transformers/models/bloom/tokenization_bloom_fast.py Co-authored-by: SaulLu <[email protected]> * make quality * apply changes - remove new attribute - redefine test on the class * add comments Co-authored-by: SaulLu <[email protected]>

* Fix tests that broke when models used batchnorm * Initializing the model twice does not actually... ...give you the same weights each time. I am good at machine learning. * Fix speed regression

As shown in the colab notebook I added the missing type hints for " CvtForImageClassification CvtModel "

* Adjust test arguments and use a new example test

* wip * rebase * all tests pass * rebase * ready for PR * address comments * fix styles * add require_torch to pipeline test * remove remote image to improve CI consistency * address comments; fix tf/flax tests * address comments; fix tf/flax tests * fix tests; add alias * repo consistency tests * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <[email protected]> * address comments * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <[email protected]> * merge * wip * wip * wip * most basic tests passes * all tests pass now * relative embedding * wip * running make fixup * remove bert changes * fix doc * fix doc * fix issues * fix doc * address comments * fix CI * remove redundant copied from * address comments * fix broken test Co-authored-by: Sijun He <[email protected]> Co-authored-by: NielsRogge <[email protected]>

…chs in no_trainer scripts (huggingface#17856)

…gingface#17573) * Auto-build on setup modification * Modify push-caller * Make adjustments based on code review

* Improve vision models * Add a lot of improvements * Remove to_2tuple from swin tests * Fix TF Swin * Fix more tests * Fix copies * Improve more models * Fix ViTMAE test * Add channel check for TF models * Add proper channel check for TF models * Apply suggestion from code review * Apply suggestions from code review * Add channel check for Flax models, apply suggestion * Fix bug * Add tests for greyscale images * Add test for interpolation of pos encodigns Co-authored-by: Niels Rogge <[email protected]>

* Copied all the changes from the last PR * added in documentation_tests.txt * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: Yih-Dar <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/encoder-decoder.mdx Co-authored-by: NielsRogge <[email protected]> Co-authored-by: vishwaspai <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Yih-Dar <[email protected]>

…ngface#17814) * fix(ConstrainedBeamSearchScorer.step_sentence_constraint): avoid hypothesis duplication between topk and advance * fix(GenerationMixin.constrained_beam_search): appropriately assign beam scores instead of token scores

…gface#17865)

* fix Co-authored-by: ydshieh <[email protected]>

* Add CodeGen model * Add missing key and switch order of super() * Fix torch.ones init with uint8 instead of bool * Address comments: copy statements and doc * update tests * remove old model parallel * fix batch gen tests * fix batch gen test * update test_gpt2_sample_max_time * fix codgen test and revert gpt2 test change * Fix incorrect tie_word_embedding value, typo, URL * Fix model order in README and styling * Reorder model list alphabetically * Set tie_word_embedding to False by default * Apply suggestions from code review * Better attn mask name & remove attn masked_bias * add tokenizer for codegen * quality * doc tokenizer * fix-copies * add CodeGenTokenizer in converter * make truncation optional * add test for truncation * add copyright * fix-copies * fix fast tokenizer decode * Update src/transformers/models/codegen/tokenization_codegen.py Co-authored-by: Patrick von Platen <[email protected]> * increase vocab_size in tests Co-authored-by: patil-suraj <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

…ace#17871)

* feat: Add type hints for GPTNeoxForCausalLM and GPTNeoXModel * fix: removed imported Dict type * fix: Removed unused List import

) * Use higher value for hidden_size in Flax BigBird test * remove 5e-5 Co-authored-by: ydshieh <[email protected]>

…face#17864) Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* Properly get tests deps in test_fetcher * Remove print

Co-authored-by: ydshieh <[email protected]>

* Add a TF in-graph tokenizer for BERT * Add from_pretrained * Add proper truncation, option handling to match other tokenizers * Add proper imports and guards * Add test, fix all the bugs exposed by said test * Fix truncation of paired texts in graph mode, more test updates * Small fixes, add a (very careful) test for savedmodel * Add tensorflow-text dependency, make fixup * Update documentation * Update documentation * make fixup * Slight changes to tests * Add some docstring examples * Update tests * Update tests and add proper lowercasing/normalization * make fixup * Add docstring for padding! * Mark slow tests * make fixup * Fall back to BertTokenizerFast if BertTokenizer is unavailable * Fall back to BertTokenizerFast if BertTokenizer is unavailable * make fixup * Properly handle tensorflow-text dummies

Co-authored-by: ydshieh <[email protected]>

* Add new model like adds only the selected frameworks object in init * Small fix

…uggingface#17142) * bert: add conversion script for BERT Token Dropping TF2 checkpoints * bert: rename conversion script for BERT Token Dropping checkpoints * bert: fix flake errors in BERT Token Dropping conversion script * bert: make doc-builder happy!!1!11 * bert: fix pytorch_dump_path of BERT Token Dropping conversion script

…huggingface#17877) * only special scale init each gpt2 c_proj weight once, on exact match * fix double quotes Co-authored-by: leandro <[email protected]>

…17887) * fix Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* add loading_info Co-authored-by: ydshieh <[email protected]>

…ace#17918)

* Move all pixelshuffle logic into layer * Rename layer * Use correct input to function

…e` (huggingface#17908)

* add group vit and fixed test (except slow) * passing slow test * addressed some comments * fixed test * fixed style * fixed copy * fixed segmentation output * fixed test * fixed relative path * fixed copy * add ignore non auto configured * fixed docstring, add doc * fixed copies * Apply suggestions from code review merge suggestions Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * resolve comment, renaming model * delete unused attr * use fix copies * resolve comments * fixed attn * remove unused vars * refactor tests * resolve final comments * add demo notebook * fixed inconsitent default * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * rename stage->stages * Create single GroupViTEncoderLayer class * Update conversion script * Simplify conversion script * Remove cross-attention class in favor of GroupViTAttention * Convert other model as well, add processor to conversion script * addressing final comment * fixed args * Update src/transformers/models/groupvit/modeling_groupvit.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Niels Rogge <[email protected]>

…e#17606 (huggingface#17906) Fixing a regression with `return_all_scores` introduced in huggingface#17606 - The legacy test actually tested `return_all_scores=False` (the actual default) instead of `return_all_scores=True` (the actual weird case). This commit adds the correct legacy test and fixes it. Tmp legacy tests. Actually fix the regression (also contains lists) Less diffed code.

Co-authored-by: ydshieh <[email protected]>

* Fix the Conda package build * Update build.sh * Update release-conda.yml

Co-authored-by: Niels Rogge <[email protected]>

…XLA-generate-compatible (huggingface#17857) * working beam search 🎉 * XLA generation compatible with ALL classes * add xla generation slow test

@ydshieh

* chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <[email protected]> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (huggingface#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <[email protected]> Co-authored-by: matt <[email protected]> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <[email protected]> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: ydshieh <[email protected]>

…e#17926)

* Add MVP model * Update README * Remove useless module * Update docs * Fix bugs in tokenizer * Remove useless test * Remove useless module * Update vocab * Remove specifying * Remove specifying * Add #Copied ... statement * Update paper link * Remove useless TFMvp * Add #Copied ... statement * Fix style in test mvp model * Fix some typos * Fix properties of unset special tokens in non verbose mode * Update paper link * Update MVP doc * Update MVP doc * Fix README * Fix typos in docs * Update docs

…ingface#17939) * Revert "Skip failing test until they are fixed." This reverts commit 8f40077. * Use `tiny-detr` checkpts from `hf-internal-testing`

* Fix all is_torch_tpu_available

* use explicit torch version Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* ExplicitEnum subclass str (JSON dump compatible) * allow union if one of the types is str

…ce#17950)

* add MobileViT * fixup * Update README.md Co-authored-by: NielsRogge <[email protected]> * remove empty line Co-authored-by: NielsRogge <[email protected]> * use clearer variable names * rename to MobileViTTransformerLayer * no longer inherit from nn.Sequential * fixup * fixup * not sure why this got added twice * rename organization for checkpoints * fix it up * Update src/transformers/models/mobilevit/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/models/mobilevit/test_modeling_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <[email protected]> * code style improvements * fixup * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <[email protected]> * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <[email protected]> * download labels from hub * rename layers * rename more layers * don't compute loss in separate function * remove some nn.Sequential * replace nn.Sequential with new MobileViTTransformer class * replace nn.Sequential with MobileViTMobileNetLayer * fix pruning since model structure changed * fixup * fix doc comment * remove custom resize from feature extractor * fix ONNX import * add to doc tests * use center_crop from image_utils * move RGB->BGR flipping into image_utils * fix broken tests * wrong type hint * small tweaks Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

* Removed dead position_id code, fix huggingface#17893 * Removed unused var * Now ignores removed (dead) dict key for backward comp

* first draft adding Flax-t5-encoder and Flax-mt5-encoder * imports * after make fixup * flax t5 encoder test * black on test * make fix-copies * clean * all_model_classes -> tuple * clean test * is_encoder_decoder=False in t5-enc tester * remove file docstring before FlaxT5Encoder * black * isort * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <[email protected]> * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <[email protected]> * Apply suggestions from code review Co-authored-by: Suraj Patil <[email protected]> * remove _get_encoder_module * self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder * bugfix - self.module_class is class itself, not instance; * docs for mt5 and t5 * call -> __call__ in t5 doc * FlaxMT5EncoderModel to TYPE_HINT * run doc-builder to allow change the files Co-authored-by: Suraj Patil <[email protected]>

) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests

* doc: Unify training arg type annotations * wip: extracting enum type from Union * blackening

) * trigger test failure * upload revision poc * Update src/transformers/pipelines/base.py Co-authored-by: Julien Chaumond <[email protected]> * up * add test * correct some stuff * Update src/transformers/pipelines/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * correct require flag Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…uggingface#17951)

* sharded conversion; add flag to control max hidden error * better hidden name matching * Add test: load TF from PT shards * fix test (PT data must be local)

Co-authored-by: ydshieh <[email protected]>

* Add ONNX support for LayoutLMv3 * Update docstrings * Update empty description in docstring * Fix imports and type hints

* feat: add pipeline registry abstraction - added `PipelineRegistry` abstraction - updates `add_new_pipeline.mdx` (english docs) to reflect the api addition - migrate `check_task` and `get_supported_tasks` from transformers/pipelines/__init__.py to transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks} Signed-off-by: Aaron Pham <[email protected]> * fix: update with upstream/main chore: Apply suggestions from sgugger's code review Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * chore: PR updates - revert src/transformers/dependency_versions_table.py from upstream/main - updates pipeline registry to use global variables Signed-off-by: Aaron Pham <[email protected]> * tests: add tests for pipeline registry Signed-off-by: Aaron Pham <[email protected]> * tests: add test for output warning. Signed-off-by: Aaron Pham <[email protected]> * chore: fmt and cleanup unused imports Signed-off-by: Aaron Pham <[email protected]> * fix: change imports to top of the file and address comments Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

* skip some gpt_neox tests that require 80G RAM * remove tests * fix quality Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* fixing fsdp autowrap functionality * update version and quality * update torch version to latest stable version

* add onnx support for BLOOM * use TYPE_CHECKING for type annotations * fix past_shape for bloom (different from gpt2) * use logical_or instead of `+` for onnx support * bigger `atol_for_validation` for larger bloom models * copied -> taken because it's no longer an exact copy * remove "copied from" comment Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* [Flax] Add remat (gradient checkpointing) * fix variable naming in test * flip: checkpoint using a method * fix naming * fix class naming * apply PVP's suggestions from code review * make fix-copies * fix big-bird, electra, roberta * cookie-cutter * fix flax big-bird * move test to common

* Copy inputs to train and test step before modifying them, as this breaks things * Add XLA tests, fix our loss functions to be XLA-compatible * make fixup * Update loss computation test to expect vector of per-sample losses * Patch loss for TFLED * Patch loss for TFAlbert * Add a tf_legacy_loss config flag that enables old loss functions * Stop using config.get() because it's not a dict * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it * make fixup * Add XLA-compatible RAG loss * Fix dtype of loss mask for TFAlbert * Fix test for XLNet too because it overrides the default one * make fixup * Fix config test * No more depending on GPU NaN behaviour * Add test, avoid potential zero division * Fix test item assignment * Fix loss computation masking test * make fixup * Fix dtype bugs

…ne (huggingface#17970)

Co-authored-by: ydshieh <[email protected]>

…17987) * Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves huggingface#17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

huggingface#17988) * Exclude Databricks from notebook env only if the runtime is below 11.0 * Dummy commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI

@Rocketknight1

* Rought TF conversion outline * Tidy up * Fix padding differences between layers * Add back embedder - whoops * Match test file to main * Match upstream test file * Correctly pass and assign image_size parameter Co-authored-by: Sayak Paul <[email protected]> * Add in MainLayer * Correctly name layer * Tidy up AdaptivePooler * Small tidy-up More accurate type hints and remove whitespaces * Change AdaptiveAvgPool Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509 Co-authored-by: From: matt <[email protected]> Co-authored-by: Sayak Paul <[email protected]> * Use updated AdaptiveAvgPool Co-authored-by: matt <[email protected]> * Make AdaptiveAvgPool compatible with CPU * Remove image_size from configuration * Fixup * Tensorflow -> TensorFlow * Fix pt references in tests * Apply suggestions from code review - grammar and wording Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> * Add TFResNet to doc tests * PR comments - GlobalAveragePooling and clearer comments * Remove unused import * Add in keepdims argument * Add num_channels check * grammar fix: by -> of Co-authored-by: matt <[email protected]> Co-authored-by: Matt <[email protected]> * Remove transposes - keep NHWC throughout forward pass * Fixup look sharp * Add missing layer names * Final tidy up - remove from_pt now weights on hub Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Matt <[email protected]>

…ace#17501) * Refactor to inherit from nn.Module instead of nn.ModuleList * Fix typo * Empty to trigger CI re-run Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)

* Return scalar losses instead of per-sample means * Make loss shape (1,) instead of scalar * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Remove XLA loss function for RAG

Co-authored-by: Sreyan-G@NVIDIA <[email protected]>

…e#17969) * get the right slicing index for position_bias

…ggingface#18016) Co-authored-by: ydshieh <[email protected]>

Co-authored-by: ydshieh <[email protected]>

Co-authored-by: Niels Rogge <[email protected]>

…xample (huggingface#18002) * Add ALL_LAYERNORM_LAYERS for LayerNorm * fix bug of appending layer norm

…ort (huggingface#18044)

* Link to the Datasets doc * Remove unwanted file

…ngface#18046)

* Add script to sort doc ToC * Style and fixes * Add check to quality job

…ngface#18008) * Added command for windows VENV activation * changed linux and macos specification

…#18053)

…#17967) * Drop columns after loading samples, rather than before, to avoid breaking transforms * make fixup * Add workaround so this PR can work with current datasets version

* Fix slow CI by pinning resampy * Actually put it in the speech dependencies

* Fix type issue in using bucketing with Trainer - Fix type issues in LengthGrouperSampler, DistributedLengthGroupedSampler refs: huggingface#18003 * Change logging type in LengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by: Sylvain Gugger <[email protected]> * Change logging type in DistributedLengthGroupedSampler - Change `logger.warning` to `logger.info` Co-authored-by: Sylvain Gugger <[email protected]> * Remove adundant clause in LengthGroupedSampler - Use `elif` Co-authored-by: Sylvain Gugger <[email protected]> * Remove adundant clause in DistributedLengthGroupedSampler - Use `elif` Co-authored-by: Sylvain Gugger <[email protected]> * Apply black, isort to modified codes in the script Co-authored-by: Sylvain Gugger <[email protected]>

huggingface#18078) * Make Trainer.predict call on_evaluate (huggingface#17952) * Add on_predict * Small fix * Small and different fix * Add tests

Commits on Sep 9, 2022

wip

albertoandreottiATgmail committed Sep 9, 2022

Configuration menu

View commit details

Copy full SHA for cb2e4aa

Browse repository at this point

Copy the full SHA

cb2e4aa View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layoutlmv2 onnx #2

Layoutlmv2 onnx #2

Commits on Jun 7, 2022

Commits on Jun 8, 2022

Commits on Jun 9, 2022

Commits on Jun 10, 2022

Commits on Jun 13, 2022

Commits on Jun 14, 2022

Commits on Jun 15, 2022

Commits on Jun 16, 2022

Commits on Jun 17, 2022

Commits on Jun 18, 2022

Commits on Jun 20, 2022

Commits on Jun 21, 2022

Commits on Jun 22, 2022

Commits on Jun 23, 2022

Commits on Jun 24, 2022

Commits on Jun 25, 2022

Commits on Jun 27, 2022

Commits on Jun 28, 2022

Commits on Jun 29, 2022

Commits on Jun 30, 2022

Commits on Jul 1, 2022

Commits on Jul 4, 2022

Commits on Jul 5, 2022

Commits on Jul 6, 2022

Commits on Jul 7, 2022

Commits on Jul 8, 2022

Commits on Jul 9, 2022

Commits on Sep 9, 2022

Commits on Sep 10, 2022