truncated normal initializer #38

ruotianluo · 2018-11-19T16:35:08Z

I have a reasonable truncated normal approximation. (Actually that is what tf does).
https://discuss.pytorch.org/t/implementing-truncated-normal-initializer/4778/16?u=ruotianluo

thomwolf · 2018-11-20T09:09:23Z

We could try that. Not sure how important it is though. Did you try it?

thomwolf · 2018-11-26T09:42:42Z

Ok I think we will stick to the normal_initializer for now. Thanks for indicating this option!

add coqa runner as basic mt-coqa runner

* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <[email protected]>

Nah

Faster cuda no actorder

* Add support for DBRX * sync layers with dbrx-instruct * enable for min validated 4.38.2 * skip router.layer: out_features/32 != 0 * enable outfeatures padding for exllama v1 * re-enable router.layer * padding never used? * use hack of model * format * test marlin padding * fix moe layers * wrong var * base-converted-v2 model has Wqkv split into q,k,v * fix slow quant/packing * add note about 3 layers may be removed from quant pending test * remove norm_1/2 + router layers from quant * sync .quantize() wraning as pr huggingface#625 * compat gptqmodel * dbrx requires true_sequential = False * remove bad merge * revert marlin change * we only support dbrx_converted * add dbrx notes * Update dbrx.py * Update base.py * Update base.py * Update dbrx_converted.py * check require_trust_remote_code and trust_remote_code, if not, raise ValueError --------- Co-authored-by: LaaZa <[email protected]> Co-authored-by: diegomontoya <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>

thomwolf closed this as completed Nov 26, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020

Merge pull request huggingface#38 from stevezheng23/dev/zheng/coqa

67a9836

add coqa runner as basic mt-coqa runner

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#38 from jamesthesnake/nah

b7ecfb6

Nah

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Merge pull request huggingface#38 from PanQiWei/faster-cuda-no-actorder

771b650

Faster cuda no actorder

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Fix model loader code is using bad default of float16 (huggingface#38)

380c76f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncated normal initializer #38

truncated normal initializer #38

ruotianluo commented Nov 19, 2018

thomwolf commented Nov 20, 2018 •

edited

Loading

thomwolf commented Nov 26, 2018

truncated normal initializer #38

truncated normal initializer #38

Comments

ruotianluo commented Nov 19, 2018

thomwolf commented Nov 20, 2018 • edited Loading

thomwolf commented Nov 26, 2018

thomwolf commented Nov 20, 2018 •

edited

Loading