Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade liger to 0.3.1 #1973

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

upgrade liger to 0.3.1 #1973

wants to merge 8 commits into from

Conversation

winglian
Copy link
Collaborator

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

src/axolotl/integrations/liger/args.py Outdated Show resolved Hide resolved
tests/integrations/liger.py Show resolved Hide resolved
src/axolotl/integrations/liger/__init__.py Show resolved Hide resolved
README.md Show resolved Hide resolved
@@ -34,7 +34,7 @@ tensorboard
python-dotenv==1.0.1
autoawq>=0.2.5
triton>=2.3.0
liger-kernel==0.3.0
liger-kernel==0.3.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for their latest release or point to this commit for GA fix? linkedin/Liger-Kernel#333

@bursteratom
Copy link
Collaborator

bursteratom commented Nov 1, 2024

@NanoCode012 @winglian tried to run this particular branch just now but ran into this error

File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/triton/compiler/code_generator.py", line 1066, in visit_Attribute                                  [114/1839]
             return getattr(lhs, node.attr)                                                                                                                                   [113/1839]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                             ]
[rank0]: AttributeError: 'tensor' object has no attribute 'cast'

which happens during

File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 678, in compute_loss                                                                                     
[rank0]:     return super().compute_loss(model, inputs, return_outputs=return_outputs)                                                                                                  
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm using the default axolotl template on runpod and made sure to install the dependencies associated with this branch

And my yaml is as follows:

base_model: NousResearch/Meta-Llama-3.1-8B

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

strict: false

datasets:
    - path: tatsu-lab/alpaca
      type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_backward_prefetch: BACKWARD_PRE
special_tokens:
  pad_token: <|finetune_right_pad_id|>
  eos_token: <|eot_id|>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants