Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update state dict and model together #573

Merged
merged 27 commits into from
May 1, 2024

Conversation

mikekgfb
Copy link
Contributor

Update state dict and model together

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2024
Copy link
Contributor

@ali-khosh ali-khosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

Comment on lines +81 to +82
weight_scale=scales.float(),
weight_zero_point=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two has to either both be tensor or both be scalar, it should work if you do scales.float().item() I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... This is a vector of 32000 elements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but _qdq_dynamic_quantized_linear only supports per tensor quantization, then you may want to call a different op in that function

to align the size you could do: weight_zero_point = torch.zeros(scales.shape)

@mikekgfb mikekgfb merged commit 60616bf into main May 1, 2024
32 checks passed
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
malfet pushed a commit that referenced this pull request Jul 17, 2024
* code beautification

* code beautification, move functions together

* rewrite model rewriter

* rewrite quantizers

* weights is none check

* typo

* not weight -> weight is not None

* fix dimensions for parallel prefill

* test

* typo

* bfloat16 on ARM with MacOS 14

* precision for a8w4

* sdpa_kv

* fixes

* inline qlq definition

* trial and error

* qdq not working

* ci

* not so fast with bf16=fast

* typo, and handle fast across maxcos version...

* typo

* type cast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants