Use flat block layout for PA #92

madamczykhabana · 2024-07-08T05:54:42Z

No description provided.

Flat block table

* Cleanup AttentionMetadata on HPU * Flat PA - POC * Decode warmup overhaul * Fix input_hash calculation * Block bucket size 32 -> 16 * Improve host time * Skip UTs * Add GQA/MQA * Add mask instead of filling * 2d block mapping * Optional flipping in PA * Runner updated for 2d block mapping * Eliminate physical transposes * POC: build block_bias on device * Cleanup * Fix seq_len calculation * Experimental profiling * Add missing call to kv_matmul_op * Fix block_usage calculation * Change default block bucket step for decode to 128 * Fix max decode block bucket calculation * Fix block_usage calculations * Cleanup * Print values for bucketing vars * Pass block size do HpuModelAdapter --------- Co-authored-by: barak goldberg <[email protected]>

madamczykhabana and others added 30 commits June 18, 2024 16:39

Cleanup AttentionMetadata on HPU

0a79353

Flat PA - POC

3a5f212

Decode warmup overhaul

a456188

Debugging OOM

65f6666

Experimental profiling

1e7e023

Fix input_hash calculation

2a28443

Block bucket size 32 -> 16

97f223d

Improve host time

8e44ef6

Skip UTs

65530f1

Add GQA/MQA

1fd516d

Add mask instead of filling

19976c7

2d block mapping

0146baa

Optional flipping in PA

052a2cc

Runner updated for 2d block mapping

7da0bbc

Restore mark_step

f4bdfe5

Eliminate physical transposes

7679ad4

Merge remote-tracking branch 'origin/habana_next' into flat_block_table

60cdf14

Disable warmup_mode

d4711a5

Revert changes to test_attention.py

088aea9

POC: build block_bias on device

65489f0

Cleanup

1c332e2

Merge remote-tracking branch 'origin/habana_next' into flat_block_table

c4d84cc

Fix seq_len calculation

d157387

Experimental profiling

46fb67c

Merge branch 'experimental_profiling' into flat_block_table

707e14b

Add missing call to kv_matmul_op

7aeb218

Merge pull request #91 from madamczykhabana/flat_block_table

33b3f41

Flat block table

Fix block_usage calculation

63eeef7

Change default block bucket step for decode to 128

f11dcba

Fix max decode block bucket calculation

64dd19b

madamczykhabana added 5 commits July 9, 2024 15:23

Fix block_usage calculations

907e091

Cleanup

359425c

Cleanup profiler code

a25c01e

Print values for bucketing vars

dbffad4

Pass block size do HpuModelAdapter

6676653

madamczykhabana marked this pull request as ready for review July 9, 2024 15:12

madamczykhabana changed the title ~~[WIP] flat PA~~ Use flat block layout for PA Jul 10, 2024

Merge remote-tracking branch 'origin/habana_next' into flat_pa

e5b786f

madamczykhabana merged commit 81a23a7 into habana_next Jul 10, 2024

madamczykhabana deleted the flat_pa branch July 19, 2024 15:42

iboiko-habana mentioned this pull request Aug 19, 2024

[Bug]: Unexpected decode graph compilation after preemption #158

Closed

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use flat block layout for PA #92

Use flat block layout for PA #92

madamczykhabana commented Jul 8, 2024

Use flat block layout for PA #92

Use flat block layout for PA #92

Conversation

madamczykhabana commented Jul 8, 2024