Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing arguments in pipeline parallel generate method #12142

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

notsyncing
Copy link
Contributor

Description

Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to the generate method in pipeline_parallel.py. These two arguments are available since transformers 4.32.0.

1. Why the change?

Hello, I'm using both OpenVINO and ipex-llm in one project, and found the generate method in ipex-llm's pipeline-parallel.py has two arguments less than the generate method in transformers, causing a weird error in an OpenVINO-only code path:

Traceback (most recent call last):
  File "/mnt/data/podman/test.py", line 34, in <module>
    outputs = model.generate(**params)
  File "/mnt/data/podman/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/podman/.local/lib/python3.10/site-packages/optimum/intel/openvino/modeling_decoder.py", line 673, in generate
    result = super().generate(
  File "/mnt/data/podman/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
TypeError: generate() takes from 1 to 9 positional arguments but 11 were given

With this PR, the error is gone.

2. User API changes

No change

3. Summary of the change

Add the two missing arguments to the generate method in pipeline_parallel.py

4. How to test?

Install optimum[openvino]==1.22.0, then use the following script to reproduce and check:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.intel import OVModelForCausalLM
import ipex_llm.transformers   # This line will lead to the error above

device = "GPU.0" # the device to load the model onto

model = OVModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-7B-Instruct",  # or any smaller model
    export=True,
    use_cache=False,
    device=device
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct")

messages = [
    {"role": "user", "content": "你好"}
]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

params = {
    "inputs": inputs,
    "max_new_tokens": 512,
    "do_sample": True,
    "temperature": 1,
    "top_p": 0.95
}

outputs = model.generate(**params)
res = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(res)

This does not happen on optimum[openvino]<=1.20.0.

Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.
These two arguments are available since transformers 4.32.0.
@sgwhat sgwhat merged commit d2c821d into intel-analytics:main Nov 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants