MRPC hyperparameters question #5

ethanjperez · 2018-11-06T05:30:36Z

When describing how you reproduced the MRPC results, you say:
"Our test ran on a few seeds with the original implementation hyper-parameters gave evaluation results between 82 and 87."
and you link to the SQuAD hyperparameters (https://github.com/google-research/bert#squad).

Is the link a mistake? Or did you use the SQuAD hyperparameters for tuning on MRPC? More generally, I'm wondering if there's a reason the MRPC dev set accuracy is slightly lower (in [82, 87] vs. [84, 88] reported by Google)

thomwolf · 2018-11-06T07:36:42Z

Hi Ethan,
Thanks we used the MRPC hyper-parameters indeed, I corrected the README.
Regarding the dev set accuracy, I am not really surprised there is a slightly lower accuracy with the PyTorch version (even though the variance is high so it's hard to get something significant). That is something that is generally observed (see for example the work of Remi Cadene) and we also experienced that with our TF->PT port of the OpenAI GPT model.
My personal feeling is that there are slight differences in the way the backends of TensorFlow and PyTorch handle the operations and these differences make the pre-trained weights sub-optimal for PyTorch.

ethanjperez · 2018-11-06T13:11:49Z

Great, thanks for clarifying that. Regarding the slightly lower accuracy, that makes sense. Thanks for your help and for releasing this!

ethanjperez · 2018-11-06T13:19:40Z

Maybe it would help to train the Tensorflow pre-trained weights for e.g. one epoch in PyTorch (using the MLM and next-sentence objective)? That may help transfer to other tasks, depending on what the issue is

thomwolf · 2018-11-07T23:42:51Z

Hi @ethanjperez, actually the weight initialization fix (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) fixed in 2a97fe2) has brought us back to the TensorFlow results on MRPC (between 84 and 88%).
I am closing this issue.

ethanjperez · 2018-11-08T02:04:37Z

@thomwolf Great to hear - thanks for working to fix it!

* add default quac runner based on the latest squad runner * add default quac runner based on the latest squad runner (cont.) * add default quac runner based on the latest squad runner (cont.) * add default quac runner based on the latest squad runner (cont.) * add default quac runner based on the latest squad runner (cont.) * add default quac runner based on the latest squad runner (cont.) * update data pipeline for quac runner * update data pipeline for quac runner (cont.) * update data pipeline for quac runner (cont.) * update data pipeline for quac runner (cont.) * update data pipeline for quac runner (cont.) * update data pipeline for quac runner (cont.) * update data pipeline for quac runner (cont.) * update predict output & evaluation generation for quac runner * update predict output & evaluation generation for quac runner (cont.)

* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <[email protected]>

Raviskolli/ort

# This is the 1st commit message: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#2: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#3: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#4: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#5: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#6: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#7: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#8: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#9: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#10: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#11: Update docs/source/ko/tasks/summarization.mdx

[p

This reverts commit 15b5160.

fix binary classification for tensorflow segformer fix binary classification for tf segformer huggingface#2 fix huggingface#5 Revert "fix huggingface#5" This reverts commit 15b5160. Revert "fix huggingface#4" This reverts commit 0b534e6. fix huggingface#5 fix fix fix

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

Add message passing format Co-authored-by: Cyril Kondratenko <[email protected]>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

[WIP]Evaluation and benchmark

* Fix desc_act check and refractor marlin check code. * clean

thomwolf closed this as completed Nov 7, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

HongyanJiao mentioned this issue Sep 19, 2019

traced_model #1291

Closed

sbrody18 mentioned this issue Jun 16, 2020

Modify BERT/BERT-descendants to be TorchScript-able (not just traceable) #5067

Closed

krrishdholakia mentioned this issue Sep 20, 2020

[Bug/Question] Write With Transformers Implementation vs. Custom Implementation #7273

Closed

manchandasahil mentioned this issue Mar 22, 2021

Longformer training : CUDA error: device-side assert triggered #10852

Closed

2 tasks

Guillaume-slize mentioned this issue May 30, 2021

Encoding/decoding NLP model in tensorflow lite (fine-tuned GPT2) #11947

Closed

wamartin-aml pushed a commit to wamartin-aml/transformers that referenced this issue Nov 1, 2021

Merge pull request huggingface#5 from microsoft/raviskolli/ort

2322d22

Raviskolli/ort

manchandasahil mentioned this issue Dec 16, 2021

CUDA error: device-side assert triggered while training Marian MT #14798

Closed

4 tasks

rraminen pushed a commit to rraminen/transformers that referenced this issue Jun 3, 2022

upda (huggingface#5)

f0c7a14

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#5 from huggingface/main

0985bdd

[p

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 22, 2023

fix huggingface#5

15b5160

nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 22, 2023

Revert "fix huggingface#5"

98b0663

This reverts commit 15b5160.

nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 23, 2023

fix huggingface#5

6879c8a

LysandreJik pushed a commit to LysandreJik/transformers that referenced this issue Apr 10, 2024

Make Fix (huggingface#5)

f964504

LysandreJik pushed a commit that referenced this issue Apr 24, 2024

Add messages to agents (#5) (#30337)

90f6949

Add message passing format Co-authored-by: Cyril Kondratenko <[email protected]>

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Merge pull request huggingface#5 from PanQiWei/evaluation_and_benchmark

4f84d21

[WIP]Evaluation and benchmark

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Fix desc_act check and refractor marlin check code. (huggingface#5)

da3a7d1

* Fix desc_act check and refractor marlin check code. * clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRPC hyperparameters question #5

MRPC hyperparameters question #5

ethanjperez commented Nov 6, 2018 •

edited

Loading

thomwolf commented Nov 6, 2018

ethanjperez commented Nov 6, 2018

ethanjperez commented Nov 6, 2018 •

edited

Loading

thomwolf commented Nov 7, 2018

ethanjperez commented Nov 8, 2018

MRPC hyperparameters question #5

MRPC hyperparameters question #5

Comments

ethanjperez commented Nov 6, 2018 • edited Loading

thomwolf commented Nov 6, 2018

ethanjperez commented Nov 6, 2018

ethanjperez commented Nov 6, 2018 • edited Loading

thomwolf commented Nov 7, 2018

ethanjperez commented Nov 8, 2018

ethanjperez commented Nov 6, 2018 •

edited

Loading

ethanjperez commented Nov 6, 2018 •

edited

Loading