torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim #986

bghira · 2024-09-24T03:14:16Z

No description provided.

bghira · 2024-09-25T14:05:29Z

@sayakpaul i can't get this to work on NVIDIA devices, but it does work on Apple MPS. have you had any luck training on torchao? for cuda it complains it cannot cast (tensor, tensor) to tensor.

bghira · 2024-09-25T14:07:31Z

currently the main benefit of torchao i can see is that we get torch compile support, but torch compile doesn't even work on mps..

on MPS, torchao is 3 seconds per step slower than quanto, which is roughly the same speed as bf16, tested on Flux Dev.

so if there's no nvidia support for training and it works more slowly on MPS, not sure there is a point to merge this PR

bghira · 2024-09-27T03:27:57Z

i change my mind, i got it to work with int8 on nvidia after messing around with the internals on the GPUMODE discord.

helpers/training/custom_schedule.py

…int8 when dynamo is enabled

bghira added 8 commits September 23, 2024 21:06

torchao: fp8/autoquant

c38c109

update deps

721ce10

gc_collect should be called before clear on torch and after for mps

0d8b7cf

mps: disable gpu quantisation since it does not work

44c82ff

add int8-torchao level for mps support

6118768

update to use the newer ao api and move cuda restriction to fp8

9fcd748

allow training the full model in a quantised state

ed35338

return the modified model

296e3dd

bghira changed the title ~~torchao: fp8/autoquant~~ torchao: fp8/int8 Sep 25, 2024

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into feature/torchao

50f59d6

bghira changed the title ~~torchao: fp8/int8~~ (wip, does not work) torchao: fp8/int8 Sep 26, 2024

bghira marked this pull request as draft September 26, 2024 17:23

bghira added 2 commits September 26, 2024 13:20

torchao: low-precision optims need fp32 gradients

26ee784

torchao: cpu optimiser offload, which also does not work

de4b563

bghira closed this Sep 26, 2024

bghira added 2 commits September 26, 2024 20:12

torchao: fix int8 training by monkeypatching the broken method

979d298

update with int8 nvidia fix

b989ff3

bghira reopened this Sep 27, 2024

nvidia lock file update

cfb6e62

bghira changed the title ~~(wip, does not work) torchao: fp8/int8~~ (wip, int8 only) torchao: fp8/int8 Sep 27, 2024

bghira added 6 commits September 27, 2024 21:04

fix torch compile validation arg

a1413a1

update error msg

6124933

fix int8 again, as we cannot use filter_fn on the whole model

a4d3e6b

remove fp8 and auto

4f28ebb

update message for loading module

8bf7107

torchao: rename quantoise -> quantise_model

f40d67c

bghira added 3 commits September 27, 2024 23:16

quanto: add nf4 support

3def7c5

update optimum-quanto for nf4 support

3aa1925

update options doc contents, adding quantisation notes

1017e8a

bghira changed the title ~~(wip, int8 only) torchao: fp8/int8~~ int8-torchao, nf4-quanto Sep 28, 2024

bghira commented Sep 28, 2024

View reviewed changes

helpers/training/custom_schedule.py Outdated Show resolved Hide resolved

bghira and others added 7 commits September 27, 2024 23:44

Update helpers/training/custom_schedule.py

f6d770c

update quanto fp8 for marlin gemm kernel and auto switch from fp8 to …

284af19

…int8 when dynamo is enabled

reformat files that were missed earlier

509a25e

reorganise options doc

34ad1fd

disable cpu offloaded optim

e97183c

dynamo optimisation for flux transformer, always use fp32 rope

a43ff90

remove old optimiser init

9bf32b8

bghira changed the title ~~int8-torchao, nf4-quanto~~ torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim Sep 28, 2024

bghira marked this pull request as ready for review September 28, 2024 17:36

bghira merged commit 8298530 into main Sep 28, 2024
1 check passed

bghira deleted the feature/torchao branch September 28, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim #986

torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim #986

bghira commented Sep 24, 2024

bghira commented Sep 25, 2024

bghira commented Sep 25, 2024

This comment was marked as outdated.

bghira commented Sep 27, 2024

torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim #986

torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim #986

Conversation

bghira commented Sep 24, 2024

bghira commented Sep 25, 2024

bghira commented Sep 25, 2024

This comment was marked as outdated.

bghira commented Sep 27, 2024