Optim - added quantization code. #968

sgsharma2000 · 2023-12-14T22:43:05Z

Added quantization code mainly inside generator.py and model.py - but show very marginal improvements in timing for batch sizes.

Add benchmark

Benchmark2

…rence benchmark

Torchscript

facebook-github-bot · 2023-12-14T22:43:12Z

Hi @sgsharma2000!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

…cch are Half, not float

… created my own attribute

subramen · 2023-12-20T20:55:46Z

Hi @sgsharma2000 firstly thank you for submitting this (fairly large) PR!

Reviewing your proposed changes will be much easier if you could batch your changes across multiple smaller and narrowly-scoped PRs. Please also include information on what changes you've made, what are you trying to achieve, and the rationale for your approach.

gtamer2 and others added 30 commits December 12, 2023 21:38

Add inference benchmark, delete extra files

236b696

more deletions

ed186f5

Merge pull request #1 from gtamer2/add_benchmark

64ab5a8

Add benchmark

Changes

62f242d

import fire

257962f

git ignore

a89488a

fixes

5a84145

Push changes

cf2b652

comment out omdel.to

47400da

print x for debugging

4c4d439

inference_benchmamrk

2071539

inference_benchmamrk

58eee14

Try large batch

86ca1c7

Get workign for 1 batch

f146895

Get workign for 1 batch

fd8a6cf

Add torch profiler

04cb363

Indent

40abad2

Indent

d6336fc

Profiel cpu and cuda

f80cf22

move profiler down

8436d96

Try using simplified llama

60bc868

revert

84416a4

Move outside

7dfe9e9

one more yolo

e5f0408

rerecord mem

d7da6e1

Merge pull request #2 from gtamer2/benchmark2

b988f0e

Benchmark2

added torch.jit.script to model in generation.py

431ebab

Script to run benchmarks

67d5a14

Fix missing param

dcda8a1

tried changing .jit.script to be around the llama object call in infe…

816b895

…rence benchmark

gtamer2 and others added 6 commits December 14, 2023 16:22

Add benchmarks

5534bbf

get quantization working

44feb11

Merge pull request #5 from gtamer2/torchscript

ae535a2

Torchscript

add some more quantization lines

8e34266

got rid of fuse_model()

94c70fb

adding convert to quantization model

3920ee5

gtamer2 and others added 22 commits December 14, 2023 18:57

quantize script

adb0190

fire launch

d71ffa0

Inplace

748bc32

access the transformer

775475a

New quant logic

165cd54

rge branch 'main' of https://github.com/gtamer2/hpml_llama into optim

005d8c5

fix quant sample inf

c995e43

Fix args

575c197

move h=quant(h) to after firist layer. It was operating on tokens whi…

787efd5

…cch are Half, not float

added a sample for pruning model

d0b8b39

maybe it prunes now?

fab7b44

added attention

1a85f36

add torch.nn.parameter to this

0bf73c4

added some changes for including torch.nn.Parameter into the pruning,…

ac4fb51

… created my own attribute

get rid of quantization modifications in model.py

caf7482

checking sparsity

b790b37

trying to find sparsity again

1c13854

we got rid of sparsity

bf0ebe0

giving sparsity another try

cb43151

trying newer things with prune_model.py

6779933

got rid of print statements, left them until after it was done

660e231

trying to make sure layer is being modified in place

651f8fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optim - added quantization code. #968

Optim - added quantization code. #968

sgsharma2000 commented Dec 14, 2023 •

edited

Loading

facebook-github-bot commented Dec 14, 2023

subramen commented Dec 20, 2023

Optim - added quantization code. #968

Are you sure you want to change the base?

Optim - added quantization code. #968

Conversation

sgsharma2000 commented Dec 14, 2023 • edited Loading

facebook-github-bot commented Dec 14, 2023

Action Required

Process

subramen commented Dec 20, 2023

sgsharma2000 commented Dec 14, 2023 •

edited

Loading