Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working around new int4wo weight packing #1389

Open
Jack-Khuu opened this issue Dec 7, 2024 · 4 comments · May be fixed by pytorch/torchchat#1404
Open

Working around new int4wo weight packing #1389

Jack-Khuu opened this issue Dec 7, 2024 · 4 comments · May be fixed by pytorch/torchchat#1404

Comments

@Jack-Khuu
Copy link
Contributor

Given the change in output shape/behavior in pytorch/pytorch#139611 + #1278

Question: What is the recommended way of migrating to the new cpu implementation of

  • _weight_int4pack_mm_for_cpu
  • _convert_weight_to_int4pack_for_cpu

while maintaining the previous behavior?


Specifically _convert_weight_to_int4pack

        q, s, z = Q4_0.unpack(t)
        scales_and_zeros = pack_scales_and_zeros(s, z)
        q_uint8 = (q[::, ::2] << 4 | q[::, 1::2]).to(torch.uint8)
        weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(
            q_uint8, inner_k_tiles
        )

and _weight_int4pack_mm

        c = torch.ops.aten._weight_int4pack_mm(
            input,
            weight_int4pack,
            groupsize,
            scales_and_zeros,
        )

Tested: With no code changes

The following error is encountered:

Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend

Tested: Naive (Just add *_for_cpu)

Size mismatch was encountered (expected since signatures are different)

size mismatch for model.layers.0.attention.wq.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([256, 16, 32, 4]).

cc: @yanbing-j @jerryzh168 who worked on the changes

@jerryzh168
Copy link
Contributor

there is no change of the input shape I believe, so the old code should work after you add _for_cpu

size mismatch for model.layers.0.attention.wq.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([256, 16, 32, 4]).

this seems to be an error of loading a unquantized model state dict into a quantized model?

@jerryzh168
Copy link
Contributor

@yanbing-j can you make corresponding changes in torchchat (https://github.com/pytorch/torchchat/blob/main/torchchat/utils/gguf_loader.py#L609C17-L614C18) as well? also it would be helpful to add some docs for https://github.com/pytorch/pytorch/blob/7939b5f5f9b073984c26adef1446fa250a20bceb/aten/src/ATen/native/LinearAlgebra.cpp#L3457 and friends so it's clear the input and output dimensions

@yanbing-j
Copy link
Contributor

@Jack-Khuu @jerryzh168

I follow https://github.com/pytorch/torchchat/blob/main/.github/workflows/pull.yml#L830-L874 to reproduce this issue.
With the PR of pytorch/torchchat#1404, it can run on CPU now. The root cause is weight in WeightOnlyInt4Linear needs to be updated.

@Jack-Khuu
Copy link
Contributor Author

Thanks @yanbing-j, I'll follow up in the other PR

there is no change of the input shape I believe

There is a change input type and output shape i believe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants