-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
truncated normal initializer #38
Comments
We could try that. Not sure how important it is though. Did you try it? |
Ok I think we will stick to the normal_initializer for now. Thanks for indicating this option! |
stevezheng23
added a commit
to stevezheng23/transformers
that referenced
this issue
Mar 24, 2020
add coqa runner as basic mt-coqa runner
xloem
pushed a commit
to xloem/transformers
that referenced
this issue
Apr 9, 2023
* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <[email protected]>
jameshennessytempus
pushed a commit
to jameshennessytempus/transformers
that referenced
this issue
Jun 1, 2023
1 task
ZYC-ModelCloud
pushed a commit
to ZYC-ModelCloud/transformers
that referenced
this issue
Nov 14, 2024
Faster cuda no actorder
ZYC-ModelCloud
pushed a commit
to ZYC-ModelCloud/transformers
that referenced
this issue
Nov 14, 2024
ZYC-ModelCloud
pushed a commit
to ZYC-ModelCloud/transformers
that referenced
this issue
Nov 14, 2024
* Add support for DBRX * sync layers with dbrx-instruct * enable for min validated 4.38.2 * skip router.layer: out_features/32 != 0 * enable outfeatures padding for exllama v1 * re-enable router.layer * padding never used? * use hack of model * format * test marlin padding * fix moe layers * wrong var * base-converted-v2 model has Wqkv split into q,k,v * fix slow quant/packing * add note about 3 layers may be removed from quant pending test * remove norm_1/2 + router layers from quant * sync .quantize() wraning as pr huggingface#625 * compat gptqmodel * dbrx requires true_sequential = False * remove bad merge * revert marlin change * we only support dbrx_converted * add dbrx notes * Update dbrx.py * Update base.py * Update base.py * Update dbrx_converted.py * check require_trust_remote_code and trust_remote_code, if not, raise ValueError --------- Co-authored-by: LaaZa <[email protected]> Co-authored-by: diegomontoya <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a reasonable truncated normal approximation. (Actually that is what tf does).
https://discuss.pytorch.org/t/implementing-truncated-normal-initializer/4778/16?u=ruotianluo
The text was updated successfully, but these errors were encountered: