Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Audio LLM]support audiollm for asr, based on whisper and llama3 #2532

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

Zth9730
Copy link
Contributor

@Zth9730 Zth9730 commented May 20, 2024

conduct experiment on librispeech dataset for severl steps:
image

@thsxbw
Copy link

thsxbw commented May 23, 2024

Can you provide config.yaml or experimental results?

@Zth9730
Copy link
Contributor Author

Zth9730 commented May 23, 2024

Can you provide config.yaml or experimental results?

Yes, there will be new commits later.

@fclearner
Copy link
Contributor

Is this branch support qwen?

@Zth9730
Copy link
Contributor Author

Zth9730 commented Aug 7, 2024

audiollm.yaml

The hyperparameters, such as learning rate and warmup, may not be the best.

accum_grad: 1
cmvn: null
cmvn_conf:
  cmvn_file: null
  is_json_cmvn: null
dataset: audio_llm
dataset_conf:
  batch_conf:
    batch_type: static
    batch_size: 4
  cycle: 1
  data_style: audiosft
  data_style_conf:
    add_bos: true
    add_eos: true
    template: audio_llama3
  feats_type: log_mel_spectrogram
  filter_audio_conf:
    max_length: 3000
    min_length: 0
  filter_conf:
    token_max_length: 8192
    token_min_length: 1
  log_mel_spectrogram_conf:
    hop_length: 160
    n_fft: 400
    num_mel_bins: 128
    pad_or_trim: ture
    padding: 0
  resample_conf:
    resample_rate: 16000
  shift: true
  shuffle: true
  shuffle_conf:
    shuffle_size: 1500
  shuffle_list: true
  shuffle_list_conf:
    shuffle_size: 15000
  sort: true
  sort_conf:
    sort_size: 500
    spec_aug: true
  spec_aug_conf:
    max_f: 10
    max_t: 50
    num_f_mask: 0
    num_t_mask: 0
  spec_sub: false
  spec_sub_conf:
    max_t: 30
    num_t_sub: 3
  spec_trim: false
  speed_perturb: false
decoder: decoder_only
decoder_conf:
  activation_type: swish
  attention_dropout_rate: 0.0
  attention_heads: 32
  dropout_rate: 0.0
  gelu_approximate: null
  gradient_checkpointing: true
  head_dim: 128
  hidden_size: 4096
  linear_units: 14336
  max_position_embeding: 8192
  n_kv_head: 8
  norm_eps: 1.0e-05
  normalize_before: true
  num_blocks: 32
  positional_dropout_rate: 0.0
  rms_norm_offset: false
  rope_style: llama
  rope_theta: 500000.0
  scale_embed: false
  use_sdpa: true
encoder: transformer
encoder_conf:
  activation_type: gelu
  attention_dropout_rate: 0.0
  attention_heads: 20
  dropout_rate: 0.1
  gradient_checkpointing: true
  input_layer: conv1d2
  key_bias: false
  linear_units: 5120
  normalize_before: true
  num_blocks: 32
  output_size: 1280
  pos_enc_layer_type: abs_pos_whisper
  positional_dropout_rate: 0.1
  static_chunk_size: -1
  use_dynamic_chunk: false
  use_dynamic_left_chunk: false
  use_sdpa: true
grad_clip: 1
input_dim: 128
log_interval: 40
max_epoch: 3
save_limited: 1
save_best_ckpt: True
model: audio_llm
model_conf:
  bottleneck_mid_dim: 512
  bottleneck_type: conv-linear
  conv_kernel_sizes:
  - 3
  - 3
  - 3
  length_normalized_loss: false
  linear_bias: false
  lsm_weight: 0.1
  tie_word_embedding: false
  freeze_decoder: true
  freeze_encoder: true
  freeze_llm_embed: false
optim: adamw
optim_conf:
  lr: 4.0e-05
  weight_decay: 0.01
output_dim: 128256
save_interval: 2000
save_states: model_only
scheduler: warmuplr
scheduler_conf:
  warmup_steps: 1000
tokenizer: huggingface
tokenizer_conf:
  model: meta-llama/Meta-Llama-3-8B
  special_tokens:
    <|begin_of_text|>: 128000
    <|end_header_id|>: 128007
    <|end_of_text|>: 128001
    <|eot_id|>: 128009
    <|start_header_id|>: 128006
vocab_size: 128256

decode scripts:

temperature=1.0
top_p=1.0
top_k=1
for test in $recog_set; do
    result_dir=$dir/${test}
    python wenet/bin/audiollm_recognize.py --gpu 0 \
      --config $dir/train.yaml \
      --data_type raw \
      --dtype bf16 \
      --test_data $wave_data/$test/data.list \
      --checkpoint $decode_checkpoint \
      --output_len 256 \
      --temperature $temperature \
      --top_p $top_p \
      --top_k $top_k \
      --result_dir $result_dir
    test_dir=$result_dir/temp${temperature}_topk${top_k}_topp${top_p}
    python tools/compute-wer.py --char=1 --v=1 \
      $wave_data/$test/text $test_dir/text > $test_dir/wer
  done

@Zth9730
Copy link
Contributor Author

Zth9730 commented Aug 7, 2024

Is this branch support qwen?

可以参考周神的代码把qwen的weight转成wenet的,就可以支持🐶

@Mddct
Copy link
Collaborator

Mddct commented Aug 7, 2024

rebase 一下main, LLM有一部分已经合到main了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants