Add ability for force bos id for mbart #22

sfc-gh-zhwang · 2023-10-05T03:58:14Z

* Update beam_search_topk_kernels.cu fix: fix bug of beam search * fix: change int of some kernels to int64_t to prevent overflow * fix: gpt tensor shapes inconsistency (NVIDIA#505) Signed-off-by: AkiyamaYummy <[email protected]> * Update gpt_guide.md (NVIDIA#529) * fix: fix bug of gpt buffer and gpt gemm overflow * Update T5DecodingWeight.cc fix: fix loading bug of t5 * [Enhancement]add pytorch backend support for gptneox (NVIDIA#550) * add pytorch backend support for gptneox Signed-off-by: AkiyamaYummy <[email protected]> * fix early stopping invalid * 1) Some unused parameters and logic have been removed. 2) Revisions that would affect pipeline parallelism have been reverted. 3) The code has been made capable of direct validation on TabbyML/NeoX-1.3B. Signed-off-by: AkiyamaYummy <[email protected]> * Change the names of classes, removing 'parallel' from their names Signed-off-by: AkiyamaYummy <[email protected]> * Format the code. Signed-off-by: AkiyamaYummy <[email protected]> * Only print results when rank is 0. Signed-off-by: AkiyamaYummy <[email protected]> * Add dist.init_process_group(). Signed-off-by: AkiyamaYummy <[email protected]> * update docs Signed-off-by: AkiyamaYummy <[email protected]> --------- Signed-off-by: AkiyamaYummy <[email protected]> * Update cublasMMWrapper.cc Fix the CUBLAS_VERSION checking of cublasMMWrapper * Update cublasMMWrapper.cc * fix overflow in softmax_kernel when process long seqlen and big batch_size (NVIDIA#524) * Update unfused_attention_kernels.cu fix bug of softmax kernel * [Enhancement]create huggingface_gptneox_convert.py (NVIDIA#569) * create huggingface_gptneox_convert.py Signed-off-by: AkiyamaYummy <[email protected]> * adjust HF's multi bin files Signed-off-by: AkiyamaYummy <[email protected]> * update gptneox_guide.md Signed-off-by: AkiyamaYummy <[email protected]> --------- Signed-off-by: AkiyamaYummy <[email protected]> * perf(bloom): improve performance of huggingface_bloom_convert.py, decrease the time cost and the mem using (NVIDIA#568) Co-authored-by: r.yang <[email protected]> * Fix/gpt early stop (NVIDIA#584) * fix: fix bug of early stopping of gpt * [bugfix] Fix 2-shot All Reduce correctness issue (indexing bug). (NVIDIA#672) FasterTransformer 2-shot all reduce is implemented as a reduce-scatter + all-gather. There is an indexing bug in the all-gather step. Prior to this change, 2-shot all reduce was only producing correct results on device 0. Now, all devices have the correct results. * fix: swap tensor bug (NVIDIA#683) * Support size_per_head=112 (NVIDIA#660) * fix multi-gpu build * add support for size_per_head=112 for gpt decoder * remove mpi_cxx from multi-gpu build for now (NVIDIA#705) --------- Signed-off-by: AkiyamaYummy <[email protected]> Co-authored-by: byshiue <[email protected]> Co-authored-by: _yummy_ <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: zhangxin81 <[email protected]> Co-authored-by: 杨睿 <[email protected]> Co-authored-by: r.yang <[email protected]> Co-authored-by: Rahul Kindi <[email protected]> Co-authored-by: Perkz Zheng <[email protected]> Co-authored-by: Daya Khudia <[email protected]> Co-authored-by: Dean Wyatte <[email protected]>

Merge remote-tracking branch 'origin' into zhwang/mbart

sfc-gh-ashankar and others added 30 commits July 10, 2023 19:24

commit

e095d10

commit

0141b94

commit

4553c67

commit

dec08a8

commit

a9d7564

commit

dddb699

commit

62a99c2

commit

a469c03

commit

4e115cb

commit

fb0cb6c

commit

fa580c3

commit

adebee7

commit

933199a

commit

18a666d

commit

5b3df49

commit

cac44d6

commit

8356c50

commit

b7b1c67

commit

ec6d344

commit

359227c

commit

8bdd5d3

commit

4a283f3

commit

ad70082

commit

810c4a6

commit

f2f3292

commit

b13b755

commit

3345d4b

commit

adc510c

commit

d3c0325

sfc-gh-zhwang added 28 commits October 4, 2023 20:50

commit

7396c9b

commit

59cef67

commit

1e36c9b

commit

b97107f

commit

5671a23

commit

ea8c5b8

commit

5b1b3ea

commit

1323a79

commit

b3c8f26

commit

4eee2a5

commit

63e1586

commit

0794d49

commit

4333b24

commit

c9cd870

commit

043661a

commit

c1384a0

commit

dfba6e5

commit

6ac30f1

commit

37ccba5

commit

ff1966a

commit

0353f25

commit

1150022

commit

d88b1dc

commit

cebd483

/opt/tritonserver/bin/tritonserver --model-repository=/models

79f1ca5

Merge remote-tracking branch 'origin' into zhwang/mbart

commit

07bba5d

commit

b474c78

commit

e177bd4

sfc-gh-zhwang changed the title ~~Zhwang/mbart~~ Add ability for force bos id for mbart Oct 5, 2023

sfc-gh-zhwang merged commit e0b124a into corvo Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability for force bos id for mbart #22

Add ability for force bos id for mbart #22

sfc-gh-zhwang commented Oct 5, 2023 •

edited

Loading

Add ability for force bos id for mbart #22

Add ability for force bos id for mbart #22

Conversation

sfc-gh-zhwang commented Oct 5, 2023 • edited Loading

sfc-gh-zhwang commented Oct 5, 2023 •

edited

Loading