Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncated normal initializer #38

Closed
ruotianluo opened this issue Nov 19, 2018 · 2 comments
Closed

truncated normal initializer #38

ruotianluo opened this issue Nov 19, 2018 · 2 comments

Comments

@ruotianluo
Copy link

I have a reasonable truncated normal approximation. (Actually that is what tf does).
https://discuss.pytorch.org/t/implementing-truncated-normal-initializer/4778/16?u=ruotianluo

@thomwolf
Copy link
Member

thomwolf commented Nov 20, 2018

We could try that. Not sure how important it is though. Did you try it?

@thomwolf
Copy link
Member

Ok I think we will stick to the normal_initializer for now. Thanks for indicating this option!

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
add coqa runner as basic mt-coqa runner
xloem pushed a commit to xloem/transformers that referenced this issue Apr 9, 2023
* Update trainer and model flows to accommodate sparseml

Disable FP16 on QAT start (huggingface#12)

* Override LRScheduler when using LRModifiers

* Disable FP16 on QAT start

* keep wrapped scaler object for training after disabling

Using QATMatMul in DistilBERT model class (huggingface#41)

Removed double quantization of output of context layer. (huggingface#45)

Fix DataParallel validation forward signatures (huggingface#47)

* Fix: DataParallel validation forward signatures

* Update: generalize forward_fn selection

Best model after epoch (huggingface#46)

fix sclaer check for non fp16 mode in trainer (huggingface#38)

Mobilebert QAT (huggingface#55)

* Remove duplicate quantization of vocabulary.

enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9)

* Utils and auxillary changes

update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54)

add flag to signal NM integration is active (huggingface#32)

Add recipe_name to file names

* Fix errors introduced in manual cherry-pick upgrade

Co-authored-by: Benjamin Fineran <[email protected]>
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
* Add support for DBRX

* sync layers with dbrx-instruct

* enable for min validated 4.38.2

* skip router.layer: out_features/32 != 0

* enable outfeatures padding for exllama v1

* re-enable router.layer

* padding never used?

* use hack of model

* format

* test marlin padding

* fix moe layers

* wrong var

* base-converted-v2 model has Wqkv split into q,k,v

* fix slow quant/packing

* add note about 3 layers may be removed from quant pending test

* remove norm_1/2 + router layers from quant

* sync .quantize() wraning as pr huggingface#625

* compat gptqmodel

* dbrx requires true_sequential = False

* remove bad merge

* revert marlin change

* we only support dbrx_converted

* add dbrx notes

* Update dbrx.py

* Update base.py

* Update base.py

* Update dbrx_converted.py

* check require_trust_remote_code and trust_remote_code, if not, raise ValueError

---------

Co-authored-by: LaaZa <[email protected]>
Co-authored-by: diegomontoya <[email protected]>
Co-authored-by: LRL-ModelCloud <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants