Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support and optimize qwen2-audio #11809

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

MeouSker77
Copy link
Contributor

@MeouSker77 MeouSker77 commented Aug 15, 2024

Description

support and optimize qwen2-audio-7B https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

1. Why the change?

2. User API changes

The following example is almost same as official example, only changes 'cuda' related code to 'xpu', and call optimize_model to do quantization and optimization

remember to use dev transformers:

pip install git+https://github.com/huggingface/transformers
from io import BytesIO
from urllib.request import urlopen
import librosa
import torch
from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from ipex_llm import optimize_model

model_path = "Qwen2-Audio-7B-Instruct"

processor = AutoProcessor.from_pretrained(model_path)
model = Qwen2AudioForConditionalGeneration.from_pretrained(model_path)
model = optimize_model(model, low_bit='sym_int4', optimize_llm=True)
model = model.half()
model = model.to('xpu')

conversation = [
    {"role": "user", "content": [
        {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav"},
    ]},
    {"role": "assistant", "content": "Yes, the speaker is female and in her twenties."},
    {"role": "user", "content": [
        {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/translate_to_chinese.wav"},
    ]},
]
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
audios = []
for message in conversation:
    if isinstance(message["content"], list):
        for ele in message["content"]:
            if ele["type"] == "audio":
                audios.append(librosa.load(
                    BytesIO(urlopen(ele['audio_url']).read()),
                    sr=processor.feature_extractor.sampling_rate)[0]
                )

inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
inputs = inputs.to('xpu')

with torch.inference_mode():
    for i in range(3):
        import time
        st = time.time()
        generate_ids = model.generate(**inputs, max_length=256)
        generate_ids = generate_ids[:, inputs.input_ids.size(1):]
        et = time.time()
        print(et - st)

response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(response)

3. Summary of the change

4. How to test?

  • N/A
  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test
  • Document test
  • ...

@MeouSker77
Copy link
Contributor Author

@MeouSker77 MeouSker77 requested a review from rnwang04 August 15, 2024 06:34
@MeouSker77 MeouSker77 merged commit 07b7f13 into intel-analytics:main Aug 15, 2024
1 check passed
@MeouSker77 MeouSker77 deleted the support-qwen2-audio branch August 15, 2024 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants