New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Port flat PA from habana_next to habana_main #169

Merged

kzawora-intel merged 15 commits into HabanaAI:habana_main from dolszewska:dev/dolszewska/flat-PA

Sep 10, 2024

+330 −379

Commits on Sep 6, 2024

Use flat block layout for PA (HabanaAI#92 )

* Cleanup AttentionMetadata on HPU

* Flat PA - POC

* Decode warmup overhaul

* Fix input_hash calculation

* Block bucket size 32 -> 16

* Improve host time

* Skip UTs

* Add GQA/MQA

* Add mask instead of filling

* 2d block mapping

* Optional flipping in PA

* Runner updated for 2d block mapping

* Eliminate physical transposes

* POC: build block_bias on device

* Cleanup

* Fix seq_len calculation

* Experimental profiling

* Add missing call to kv_matmul_op

* Fix block_usage calculation

* Change default block bucket step for decode to 128

* Fix max decode block bucket calculation

* Fix block_usage calculations

* Cleanup

* Print values for bucketing vars

* Pass block size do HpuModelAdapter

---------

Co-authored-by: barak goldberg <[email protected]>

2 people authored and dolszewska committed Sep 6, 2024

52b1451

Fix block_usage calculation (HabanaAI#96 )

madamczykhabana authored and dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for 0112aa3

Browse repository at this point
Copy the full SHA

0112aa3 View commit details

Browse the repository at this point in the history
WA for numerically unstable block_softmax (HabanaAI#104 )

szutenberg authored and dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for 965f25e

Browse repository at this point
Copy the full SHA

965f25e View commit details

Browse the repository at this point in the history
Fix finding proper block buckets (HabanaAI#119 )

jkaniecki authored and dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for e66fc0b

Browse repository at this point
Copy the full SHA

e66fc0b View commit details

Browse the repository at this point in the history
Apply formatting

dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for 8525b22

Browse repository at this point
Copy the full SHA

8525b22 View commit details

Browse the repository at this point in the history
Uncomment LoRA lines

dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for b97d844

Browse repository at this point
Copy the full SHA

b97d844 View commit details

Browse the repository at this point in the history
Cast block_mapping to long for one_hot

dolszewska committed Sep 6, 2024
Configuration menu
View commit details

Copy full SHA for f8d9048

Browse repository at this point
Copy the full SHA

f8d9048 View commit details

Browse the repository at this point in the history

Commits on Sep 9, 2024

Adjust logs messages and README to new flat-PA

dolszewska committed Sep 9, 2024
Configuration menu
View commit details

Copy full SHA for 1397980

Browse repository at this point
Copy the full SHA

1397980 View commit details

Browse the repository at this point in the history
Set warmup_mode to False

dolszewska committed Sep 9, 2024
Configuration menu
View commit details

Copy full SHA for 0440fb2

Browse repository at this point
Copy the full SHA

0440fb2 View commit details

Browse the repository at this point in the history
Remove unsqueeze

dolszewska committed Sep 9, 2024
Configuration menu
View commit details

Copy full SHA for 9916b6b

Browse repository at this point
Copy the full SHA

9916b6b View commit details

Browse the repository at this point in the history
Merge branch 'habana_main' into dev/dolszewska/flat-PA

dolszewska authored Sep 9, 2024
Configuration menu
View commit details

Copy full SHA for 4e6bfa1

Browse repository at this point
Copy the full SHA

4e6bfa1 View commit details

Browse the repository at this point in the history

Commits on Sep 10, 2024

Fix formatting, re-add comment

dolszewska committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 0632846

Browse repository at this point
Copy the full SHA

0632846 View commit details

Browse the repository at this point in the history
Use max_num_batched_tokens in profile_run

dolszewska committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 793f54b

Browse repository at this point
Copy the full SHA

793f54b View commit details

Browse the repository at this point in the history
Fix logging warmup

dolszewska committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 7468aab

Browse repository at this point
Copy the full SHA

7468aab View commit details

Browse the repository at this point in the history
Hardcode default values for max prompt and decode seq
```
The default value for both max prompt and decode seq
should be max model len, but it causes graph compilation
error for longer seqs - to be fixed
```
dolszewska committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 36fc84e

Browse repository at this point
Copy the full SHA

36fc84e View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port flat PA from habana_next to habana_main #169

Port flat PA from habana_next to habana_main #169

Commits on Sep 6, 2024

Commits on Sep 9, 2024

Commits on Sep 10, 2024