forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port flat PA from habana_next to habana_main #169
Merged
kzawora-intel
merged 15 commits into
HabanaAI:habana_main
from
dolszewska:dev/dolszewska/flat-PA
Sep 10, 2024
Merged
Port flat PA from habana_next to habana_main #169
kzawora-intel
merged 15 commits into
HabanaAI:habana_main
from
dolszewska:dev/dolszewska/flat-PA
Sep 10, 2024
+330
−379
Commits on Sep 6, 2024
-
Use flat block layout for PA (HabanaAI#92)
* Cleanup AttentionMetadata on HPU * Flat PA - POC * Decode warmup overhaul * Fix input_hash calculation * Block bucket size 32 -> 16 * Improve host time * Skip UTs * Add GQA/MQA * Add mask instead of filling * 2d block mapping * Optional flipping in PA * Runner updated for 2d block mapping * Eliminate physical transposes * POC: build block_bias on device * Cleanup * Fix seq_len calculation * Experimental profiling * Add missing call to kv_matmul_op * Fix block_usage calculation * Change default block bucket step for decode to 128 * Fix max decode block bucket calculation * Fix block_usage calculations * Cleanup * Print values for bucketing vars * Pass block size do HpuModelAdapter --------- Co-authored-by: barak goldberg <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 52b1451 - Browse repository at this point
Copy the full SHA 52b1451View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0112aa3 - Browse repository at this point
Copy the full SHA 0112aa3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 965f25e - Browse repository at this point
Copy the full SHA 965f25eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e66fc0b - Browse repository at this point
Copy the full SHA e66fc0bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8525b22 - Browse repository at this point
Copy the full SHA 8525b22View commit details -
Configuration menu - View commit details
-
Copy full SHA for b97d844 - Browse repository at this point
Copy the full SHA b97d844View commit details -
Configuration menu - View commit details
-
Copy full SHA for f8d9048 - Browse repository at this point
Copy the full SHA f8d9048View commit details
Commits on Sep 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1397980 - Browse repository at this point
Copy the full SHA 1397980View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0440fb2 - Browse repository at this point
Copy the full SHA 0440fb2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9916b6b - Browse repository at this point
Copy the full SHA 9916b6bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4e6bfa1 - Browse repository at this point
Copy the full SHA 4e6bfa1View commit details
Commits on Sep 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0632846 - Browse repository at this point
Copy the full SHA 0632846View commit details -
Configuration menu - View commit details
-
Copy full SHA for 793f54b - Browse repository at this point
Copy the full SHA 793f54bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7468aab - Browse repository at this point
Copy the full SHA 7468aabView commit details -
Hardcode default values for max prompt and decode seq
The default value for both max prompt and decode seq should be max model len, but it causes graph compilation error for longer seqs - to be fixed
Configuration menu - View commit details
-
Copy full SHA for 36fc84e - Browse repository at this point
Copy the full SHA 36fc84eView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.