support and optimize qwen2-audio #11809

MeouSker77 · 2024-08-15T06:30:13Z

Description

support and optimize qwen2-audio-7B https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

1. Why the change?

2. User API changes

The following example is almost same as official example, only changes 'cuda' related code to 'xpu', and call optimize_model to do quantization and optimization

remember to use dev transformers:

pip install git+https://github.com/huggingface/transformers

from io import BytesIO
from urllib.request import urlopen
import librosa
import torch
from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from ipex_llm import optimize_model

model_path = "Qwen2-Audio-7B-Instruct"

processor = AutoProcessor.from_pretrained(model_path)
model = Qwen2AudioForConditionalGeneration.from_pretrained(model_path)
model = optimize_model(model, low_bit='sym_int4', optimize_llm=True)
model = model.half()
model = model.to('xpu')

conversation = [
    {"role": "user", "content": [
        {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav"},
    ]},
    {"role": "assistant", "content": "Yes, the speaker is female and in her twenties."},
    {"role": "user", "content": [
        {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/translate_to_chinese.wav"},
    ]},
]
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
audios = []
for message in conversation:
    if isinstance(message["content"], list):
        for ele in message["content"]:
            if ele["type"] == "audio":
                audios.append(librosa.load(
                    BytesIO(urlopen(ele['audio_url']).read()),
                    sr=processor.feature_extractor.sampling_rate)[0]
                )

inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
inputs = inputs.to('xpu')

with torch.inference_mode():
    for i in range(3):
        import time
        st = time.time()
        generate_ids = model.generate(**inputs, max_length=256)
        generate_ids = generate_ids[:, inputs.input_ids.size(1):]
        et = time.time()
        print(et - st)

response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(response)

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

MeouSker77 · 2024-08-15T06:32:16Z

PR validation: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10399837140

support and optimize qwen2-audio

87d4daf

MeouSker77 requested a review from rnwang04 August 15, 2024 06:34

rnwang04 approved these changes Aug 15, 2024

View reviewed changes

MeouSker77 merged commit 07b7f13 into intel-analytics:main Aug 15, 2024
1 check passed

MeouSker77 deleted the support-qwen2-audio branch August 15, 2024 06:59

MeouSker77 mentioned this pull request Aug 16, 2024

[Qwen2-Audio-7B-Instruct] model support #11804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support and optimize qwen2-audio #11809

support and optimize qwen2-audio #11809

MeouSker77 commented Aug 15, 2024 •

edited

Loading

MeouSker77 commented Aug 15, 2024

support and optimize qwen2-audio #11809

support and optimize qwen2-audio #11809

Conversation

MeouSker77 commented Aug 15, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

MeouSker77 commented Aug 15, 2024

MeouSker77 commented Aug 15, 2024 •

edited

Loading