-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_squad questions #3
Comments
It also seems to me that the SQuAD 1.1 can not reproduce the google tensorflow version performance. |
What batch size are you running? |
I'm running on 4 GPU with a batch size of 48, the result is {"exact_match": 21.551561021759696, "f1": 41.785968963154055} |
Just ran on 1 GPU batch size of 10, the result is {"exact_match": 21.778618732261116, "f1": 41.83593185416649} |
Sure, Thanks, I'm checking for the reason too, will report if find anything. |
The predictions file is only outputting one word. Need to find out if the bug is in the model itself or write predictions function in run_squad.py. The correct answer always seems to be in the nbest_predictions, but its never selected. |
What performance does Hugging Face get on SQuAD using this reimplementation? |
Hi all, |
If you're comparing activations, it may be worth comparing gradients as well to see if you receive similarly low gradients standard deviations for identical batches. You might see that the gradient is not comparable from the last layer itself (due to e.g. difference in how PyTorch may handle weight decay / optimization differently); you may also see that gradients only become not comparable only after a particular point in backpropagation, and that would show perhaps that the backward pass for a particular function differs between PyTorch and Tensorflow |
Ok guys thanks for waiting, we've nailed down the culprit which was in fact a bug in the pre-processing logic (more exactly this dumb typo https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/run_squad.py#L865). I took the occasion to clean up a few things I noticed while walking through the code:
These fixes are pushed on the All in all I think we are pretty good now and none of these issues affected the core PyTorch model (the BERT Transformer it-self) so if you only used I will merge the develop branch as soon as we got the final results confirmed (currently it's been training for 20 minutes (0.3 epoch) on 4GPU with a batch size of 56 and we are already above 85 on F1 on SQuAD and 77 in exact match so I'm rather confident and I think you guys can play with it too now). I am also cleaning up the code base to prepare for a first release that we will put on pip for easier access. |
@thomwolf This is awesome - thank you! Do you know what the final SQuAD results were from the training run you started? |
I got It trains in about 1h/epoch on 4 GPUs with such a big batch size and truncated examples. |
Using the same HP as the TensorFlow version we are actually slightly better on F1 than the original implementation (on the default random seed we used): I am trying |
Great, I saw the BERT-large ones as well - thank you for sharing these results! How long did the BERT-base SQuAD training take on a single GPU when you tried it? I saw BERT-large took ~18 hours over 4 K-80's |
Hi Ethan, I didn't try SQuAD on a single-GPU. On four k-80 (not k40), BERT-base took 5h to train on SQuAD. |
* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <[email protected]>
Rebase to master
* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule huggingface#3
* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule #3
# This is the 1st commit message: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#2: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#3: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#4: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#5: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#6: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#7: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#8: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#9: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#10: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#11: Update docs/source/ko/tasks/summarization.mdx
* added flash attention for opt * added to list * fix use cache (huggingface#3) * style fix * fix text * test fix2 * reverted until 689f599 * torch fx tests are working now! * small fix * added TODO docstring * changes * comments and .md file modification --------- Co-authored-by: Younes Belkada <[email protected]>
* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes
* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt poedator#3 * Revert "Attempt poedator#3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes
* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt huggingface#3 * Revert "Attempt huggingface#3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes
* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>
* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <[email protected]> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <[email protected]> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>
* update table's style * fix spell * add run tests action * use sudo * update work flows * remove apt * update container * actions/checkout@v4 * fix path * remove cd * nvidia-smi * remove nvidia-smi * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * Create run_tests_2 * Rename run_tests_2 to run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * test a small model * Update run_tests_2.yml * Update run_tests_2.yml * Update run_tests_2.yml * remove cache * Update run_tests_2.yml * Update run_tests_2.yml * move to requirements.txt * Delete .github/workflows/run_tests_2.yml * Update run_tests.yml * Update run_tests.yml * Update requirements.txt * Update run_tests.yml * use modelcloud/gptqmodel:github-ci-v1 * use pytest * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * use pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel * Update run_tests.yml * Update run_tests.yml * add run-test need: build * Update run_tests.yml * install dep before test * use corca-ai/local-cache@v2 * Update run_tests.yml * Update run_tests.yml * Update run_tests.yml * test cache build * test build is exist * disable compile for test * list after cache * base: /tmp/gptqmodel * Update run_tests.yml * Update run_tests.yml * run test one by one * Update run_tests.yml * parameterized * model has been fixed * use gpu 0 * remove gptqmodel_cuda_available * show which file is running * fix quant desc_act * in steps * update format * fix dir --------- Co-authored-by: Qubitium-ModelCloud <[email protected]>
Thanks a lot for the port! I have some minor questions, for the run_squad file, I see two options for accumulating gradients, accumulate_gradients and gradient_accumulation_steps but it seems to me that it can be combined into one. The other one is for the global_step variable, seems we are only counting but not using this variable in gradient accumulating. Thanks again!
The text was updated successfully, but these errors were encountered: