Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port flat PA from habana_next to habana_main #169

Merged
merged 15 commits into from
Sep 10, 2024

Commits on Sep 6, 2024

  1. Use flat block layout for PA (HabanaAI#92)

    * Cleanup AttentionMetadata on HPU
    
    * Flat PA - POC
    
    * Decode warmup overhaul
    
    * Fix input_hash calculation
    
    * Block bucket size 32 -> 16
    
    * Improve host time
    
    * Skip UTs
    
    * Add GQA/MQA
    
    * Add mask instead of filling
    
    * 2d block mapping
    
    * Optional flipping in PA
    
    * Runner updated for 2d block mapping
    
    * Eliminate physical transposes
    
    * POC: build block_bias on device
    
    * Cleanup
    
    * Fix seq_len calculation
    
    * Experimental profiling
    
    * Add missing call to kv_matmul_op
    
    * Fix block_usage calculation
    
    * Change default block bucket step for decode to 128
    
    * Fix max decode block bucket calculation
    
    * Fix block_usage calculations
    
    * Cleanup
    
    * Print values for bucketing vars
    
    * Pass block size do HpuModelAdapter
    
    ---------
    
    Co-authored-by: barak goldberg <[email protected]>
    2 people authored and dolszewska committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    52b1451 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0112aa3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    965f25e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e66fc0b View commit details
    Browse the repository at this point in the history
  5. Apply formatting

    dolszewska committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    8525b22 View commit details
    Browse the repository at this point in the history
  6. Uncomment LoRA lines

    dolszewska committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    b97d844 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    f8d9048 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2024

  1. Configuration menu
    Copy the full SHA
    1397980 View commit details
    Browse the repository at this point in the history
  2. Set warmup_mode to False

    dolszewska committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    0440fb2 View commit details
    Browse the repository at this point in the history
  3. Remove unsqueeze

    dolszewska committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    9916b6b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4e6bfa1 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. Configuration menu
    Copy the full SHA
    0632846 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    793f54b View commit details
    Browse the repository at this point in the history
  3. Fix logging warmup

    dolszewska committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    7468aab View commit details
    Browse the repository at this point in the history
  4. Hardcode default values for max prompt and decode seq

    The default value for both max prompt and decode seq
    should be max model len, but it causes graph compilation
    error for longer seqs - to be fixed
    dolszewska committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    36fc84e View commit details
    Browse the repository at this point in the history