run_squad questions #3

ZhaoyueCheng · 2018-11-05T21:35:51Z

Thanks a lot for the port! I have some minor questions, for the run_squad file, I see two options for accumulating gradients, accumulate_gradients and gradient_accumulation_steps but it seems to me that it can be combined into one. The other one is for the global_step variable, seems we are only counting but not using this variable in gradient accumulating. Thanks again!

ZhaoyueCheng · 2018-11-05T22:01:47Z

It also seems to me that the SQuAD 1.1 can not reproduce the google tensorflow version performance.

abeljim · 2018-11-05T23:20:46Z

It also seems to me that the SQuAD 1.1 can not reproduce the google tensorflow version performance.

What batch size are you running?

ZhaoyueCheng · 2018-11-06T00:05:32Z

I'm running on 4 GPU with a batch size of 48, the result is {"exact_match": 21.551561021759696, "f1": 41.785968963154055}

abeljim · 2018-11-06T02:05:22Z

Just ran on 1 GPU batch size of 10, the result is {"exact_match": 21.778618732261116, "f1": 41.83593185416649}
Actually it might be with the eval code Ill look into it

ZhaoyueCheng · 2018-11-06T03:48:21Z

Sure, Thanks, I'm checking for the reason too, will report if find anything.

abeljim · 2018-11-06T03:56:34Z

The predictions file is only outputting one word. Need to find out if the bug is in the model itself or write predictions function in run_squad.py. The correct answer always seems to be in the nbest_predictions, but its never selected.

ethanjperez · 2018-11-06T05:33:16Z

What performance does Hugging Face get on SQuAD using this reimplementation?

thomwolf · 2018-11-06T07:47:08Z

Hi all,
We were not able to try SQuAD on a multi-GPU with the correct batch_size until recently so we relied on the standard deviations computed in the notebooks to compare the predicted hidden states and losses for the SQuAD script. I was able to try on a multi-GPU today and there is indeed a strong difference.
We got about the same results that you get: F1 of 41.8 and exact match of 21.7.
I am investigating that right now, my personal guess is that this may be related to things outside the model it-self like the optimizer or the post-processing in SQuAD as these were not compared between the TF and PT models.
I will keep you guys updated in this issue and I add a mention in the readme that the SQuAD example doesn't work yet.
If you have some insights, feel free to participate in the discussion.

ethanjperez · 2018-11-06T13:17:08Z

If you're comparing activations, it may be worth comparing gradients as well to see if you receive similarly low gradients standard deviations for identical batches. You might see that the gradient is not comparable from the last layer itself (due to e.g. difference in how PyTorch may handle weight decay / optimization differently); you may also see that gradients only become not comparable only after a particular point in backpropagation, and that would show perhaps that the backward pass for a particular function differs between PyTorch and Tensorflow

thomwolf · 2018-11-07T22:06:12Z

Ok guys thanks for waiting, we've nailed down the culprit which was in fact a bug in the pre-processing logic (more exactly this dumb typo https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/run_squad.py#L865).

I took the occasion to clean up a few things I noticed while walking through the code:

the weight initialization was not optimal (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) which likely affected the performance of run_classifer.py also.
gradient accumulation loss was not averaged over the accumulation steps which would have required to change the hyper-parameters for using accumulation.
the evaluation was not done with torch.no_grad() and thus sub-optimal in terms of speed/memory.

These fixes are pushed on the develop branch right now.

All in all I think we are pretty good now and none of these issues affected the core PyTorch model (the BERT Transformer it-self) so if you only used extract_features.py you were good from the beginning. And run_classifer.py was ok apart from the sub-optimal additional weights initialization.

I will merge the develop branch as soon as we got the final results confirmed (currently it's been training for 20 minutes (0.3 epoch) on 4GPU with a batch size of 56 and we are already above 85 on F1 on SQuAD and 77 in exact match so I'm rather confident and I think you guys can play with it too now).

I am also cleaning up the code base to prepare for a first release that we will put on pip for easier access.

ethanjperez · 2018-11-08T02:08:05Z

@thomwolf This is awesome - thank you! Do you know what the final SQuAD results were from the training run you started?

thomwolf · 2018-11-08T09:47:25Z

I got {"exact_match": 80.07568590350047, "f1": 87.6494485519583} with slightly sub-optimal parameters (max_seq 300 instead of 384 which means more answers are truncated and a batch_size 56 for 2 epochs of training which is probably a too big batch size and/or 1 epoch should suffice).

It trains in about 1h/epoch on 4 GPUs with such a big batch size and truncated examples.

thomwolf · 2018-11-09T11:20:01Z

Using the same HP as the TensorFlow version we are actually slightly better on F1 than the original implementation (on the default random seed we used):
{"f1": 88.52381567990474, "exact_match": 81.22043519394512}
versus TF: {"f1": 88.41249612335034, "exact_match": 81.2488174077578}

I am trying BERT-large on SQuAD now which is totally do-able on a 4 GPU server with the recommended batch-size of 24 (about 16h of expected training time using the --optimize_on_cpu option and 2 steps of gradient accumulation). I will update the readme with the results.

ethanjperez · 2018-11-12T01:18:21Z

Great, I saw the BERT-large ones as well - thank you for sharing these results! How long did the BERT-base SQuAD training take on a single GPU when you tried it? I saw BERT-large took ~18 hours over 4 K-80's

thomwolf · 2018-11-12T07:34:28Z

Hi Ethan, I didn't try SQuAD on a single-GPU. On four k-80 (not k40), BERT-base took 5h to train on SQuAD.

Catch up with main repo

Run multiple choice merge

* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <[email protected]>

Rebase to master

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule huggingface#3

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule #3

# This is the 1st commit message: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#2: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#3: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#4: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#5: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#6: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#7: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#8: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#9: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#10: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#11: Update docs/source/ko/tasks/summarization.mdx

pop

* added flash attention for opt * added to list * fix use cache (huggingface#3) * style fix * fix text * test fix2 * reverted until 689f599 * torch fx tests are working now! * small fix * added TODO docstring * changes * comments and .md file modification --------- Co-authored-by: Younes Belkada <[email protected]>

This reverts commit b9df7a0.

* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes

* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt poedator#3 * Revert "Attempt poedator#3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes

* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt huggingface#3 * Revert "Attempt huggingface#3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes

* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

* Add custom errors, improve logging

* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

Update mllama processing

Further changes

* update table's style * fix spell * add run tests action * use sudo * update work flows * remove apt * update container * actions/checkout@v4 * fix path * remove cd * nvidia-smi * remove nvidia-smi * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Create run_tests_2 * Rename run_tests_2 to run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * test a small model * Update run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * remove cache * Update run_tests_2.yml * Update run_tests_2.yml * move to requirements.txt * Delete .github/workflows/run_tests_2.yml * Update run_tests.yml * Update run_tests.yml * Update requirements.txt * Update run_tests.yml * use modelcloud/gptqmodel:github-ci-v1 * use pytest * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * use pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel * Update run_tests.yml * Update run_tests.yml * add run-test need: build * Update run_tests.yml * install dep before test * use corca-ai/local-cache@v2 * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * test cache build * test build is exist * disable compile for test * list after cache * base: /tmp/gptqmodel * Update run_tests.yml * Update run_tests.yml * run test one by one * Update run_tests.yml * parameterized * model has been fixed * use gpu 0 * remove gptqmodel_cuda_available * show which file is running * fix quant desc_act * in steps * update format * fix dir --------- Co-authored-by: Qubitium-ModelCloud <[email protected]>

VictorSanh self-assigned this Nov 5, 2018

ZhaoyueCheng changed the title ~~Minor run_squad questions~~ run_squad questions Nov 6, 2018

thomwolf self-assigned this Nov 6, 2018

thomwolf closed this as completed Nov 7, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

AndreasFdev mentioned this issue Jun 6, 2019

MRPC / SQuAD stuck in "Running training" #662

Closed

thomwolf pushed a commit that referenced this issue Jun 28, 2019

Merge pull request #3 from huggingface/master

06716d7

Catch up with main repo

thomwolf pushed a commit that referenced this issue Sep 18, 2019

Merge pull request #3 from erenup/run_multiple_choice_merge

b57bfb5

Run multiple choice merge

HongyanJiao mentioned this issue Sep 19, 2019

traced_model #1291

Closed

devroy73 mentioned this issue Nov 10, 2019

Multi GPU dataparallel crash #1779

Closed

4 tasks

sshleifer mentioned this issue Mar 13, 2020

Complete merge Seq-2-Seq generation into default generation #3225

Merged

patrickvonplaten added a commit to patrickvonplaten/transformers that referenced this issue Jun 7, 2020

Merge pull request huggingface#3 from patrickvonplaten/fix_attentions

63380d4

Rebase to master

sbrody18 mentioned this issue Jun 17, 2020

Modify BERT/BERT-descendants to be TorchScript-able (not just traceable) #5067

Closed

miyu386 pushed a commit to miyu386/transformers that referenced this issue Feb 9, 2023

Typos/fixes to link syntax (huggingface#21450)

5184a2c

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule huggingface#3

ArthurZucker referenced this issue in ArthurZucker/transformers Mar 2, 2023

Typos/fixes to link syntax (huggingface#21450)

1f80802

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule #3

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#3 from huggingface/main

8427edb

pop

arikanev mentioned this issue Aug 8, 2023

Discrepancy in Model Inference: Local vs. Hugging Face Model Hub #25362

Closed

4 tasks

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 22, 2023

fix huggingface#3

621b91e

Rocketknight1 added a commit that referenced this issue Dec 4, 2023

Attempt #3

b9df7a0

Rocketknight1 added a commit that referenced this issue Dec 4, 2023

Revert "Attempt #3"

048e304

This reverts commit b9df7a0.

Rocketknight1 added a commit that referenced this issue Dec 6, 2023

Attempt #3

420c597

Rocketknight1 added a commit that referenced this issue Dec 6, 2023

Revert "Attempt #3"

75954b6

This reverts commit b9df7a0.

Rocketknight1 added a commit that referenced this issue Dec 7, 2023

Attempt #3

f226cb8

Rocketknight1 added a commit that referenced this issue Dec 7, 2023

Revert "Attempt #3"

b85e78e

This reverts commit b9df7a0.

ArthurZucker pushed a commit that referenced this issue Feb 21, 2024

Fix bos and eos token ids in the model configuration (#3)

ac7ac87

LysandreJik pushed a commit to LysandreJik/transformers that referenced this issue Apr 10, 2024

Delete cohere-model directory (huggingface#3)

0bc7cf9

aymeric-roucher added a commit that referenced this issue Apr 23, 2024

Improve logging (#3) (#30254)

91e50ef

* Add custom errors, improve logging

renovate bot mentioned this issue May 18, 2024

Update dependency transformers to v4.41.1 aleksanderbl29/ML#53

Merged

1 task

leloykun referenced this issue in leloykun/transformers Aug 15, 2024

fix exotic models tests attempt #3

2326691

ArthurZucker pushed a commit that referenced this issue Sep 25, 2024

Merge pull request #3 from huggingface/update_mllama_processing

375b5ce

Update mllama processing

gante added a commit that referenced this issue Oct 23, 2024

Merge pull request #3 from sumedhghaisas2/synthid_text

76fc84a

Further changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_squad questions #3

run_squad questions #3

ZhaoyueCheng commented Nov 5, 2018

ZhaoyueCheng commented Nov 5, 2018

abeljim commented Nov 5, 2018

ZhaoyueCheng commented Nov 6, 2018

abeljim commented Nov 6, 2018 •

edited

Loading

ZhaoyueCheng commented Nov 6, 2018

abeljim commented Nov 6, 2018 •

edited

Loading

ethanjperez commented Nov 6, 2018

thomwolf commented Nov 6, 2018

ethanjperez commented Nov 6, 2018

thomwolf commented Nov 7, 2018 •

edited

Loading

ethanjperez commented Nov 8, 2018 •

edited

Loading

thomwolf commented Nov 8, 2018 •

edited

Loading

thomwolf commented Nov 9, 2018

ethanjperez commented Nov 12, 2018 •

edited

Loading

thomwolf commented Nov 12, 2018

run_squad questions #3

run_squad questions #3

Comments

ZhaoyueCheng commented Nov 5, 2018

ZhaoyueCheng commented Nov 5, 2018

abeljim commented Nov 5, 2018

ZhaoyueCheng commented Nov 6, 2018

abeljim commented Nov 6, 2018 • edited Loading

ZhaoyueCheng commented Nov 6, 2018

abeljim commented Nov 6, 2018 • edited Loading

ethanjperez commented Nov 6, 2018

thomwolf commented Nov 6, 2018

ethanjperez commented Nov 6, 2018

thomwolf commented Nov 7, 2018 • edited Loading

ethanjperez commented Nov 8, 2018 • edited Loading

thomwolf commented Nov 8, 2018 • edited Loading

thomwolf commented Nov 9, 2018

ethanjperez commented Nov 12, 2018 • edited Loading

thomwolf commented Nov 12, 2018

abeljim commented Nov 6, 2018 •

edited

Loading

abeljim commented Nov 6, 2018 •

edited

Loading

thomwolf commented Nov 7, 2018 •

edited

Loading

ethanjperez commented Nov 8, 2018 •

edited

Loading

thomwolf commented Nov 8, 2018 •

edited

Loading

ethanjperez commented Nov 12, 2018 •

edited

Loading