-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add RoPE scaling support for transformers (including dynamic NTK)
- Loading branch information
Showing
5 changed files
with
16 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d8fb506
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's it do to gptq-for-llama and autogptq since they use part of transformers?I just tried alpha it didn't work, started repeating. Compressed embedding might work since people used to use the monkeypatch but I have no model like that to test here.
d8fb506
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a test with alpha = 2 yesterday on llama-2-7b-hf and it generated coherent output with 5200 tokens context. The dynamic RoPE scaling here is supposed to be better than both the llama.cpp and the ExLlama NTK that is available at the moment.
d8fb506
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HF should work since they merged it. But I need to see if the GPTQs can handle it too. So far no alpha but the compressed embedding may yet work. I got to around 2100 before I got repetition on vanilla GPTQ models w/alpha. I need to try a model with compressed embedding and GPTQ to see if the patch is still necessary (it was a monkeypatch to transformers) or if the native functionality can go into the loaders.