Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests for transformers 4.36 #10858

Merged
merged 63 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
f9d0107
update unit test
jenniew Apr 23, 2024
a86c35f
update
jenniew Apr 23, 2024
8a6e0f2
update
jenniew Apr 23, 2024
d658968
update
jenniew Apr 23, 2024
66639dc
update
jenniew Apr 23, 2024
e77cee4
update
jenniew Apr 23, 2024
f1d6944
fix gpu attention test
jenniew Apr 24, 2024
c2fa88b
update
jenniew Apr 24, 2024
b255ac5
update
jenniew Apr 24, 2024
a82199a
update
jenniew Apr 24, 2024
7e7d09c
update
jenniew Apr 25, 2024
8f1c355
update
jenniew Apr 25, 2024
0e7f73a
Merge branch 'main' of https://github.com/intel-analytics/ipex-llm in…
jenniew Apr 25, 2024
e0c4407
update
jenniew Apr 25, 2024
c51b7ea
update example test
jenniew Apr 26, 2024
a442768
replace replit code
jenniew Apr 26, 2024
5563f28
update
jenniew Apr 26, 2024
b575c48
update
jenniew Apr 26, 2024
cc0ed30
update
jenniew Apr 26, 2024
04333ae
update
jenniew Apr 26, 2024
8ecdeac
set safe_serialization false
jenniew Apr 27, 2024
49a6933
perf test
jenniew Apr 30, 2024
e52180c
merge
jenniew Apr 30, 2024
9217662
update
jenniew Apr 30, 2024
3ad25b7
update
jenniew May 1, 2024
8ee92d2
update
jenniew May 1, 2024
45d2383
update
jenniew May 1, 2024
e968252
update
jenniew May 1, 2024
d59f68c
update
jenniew May 1, 2024
f44e9a4
update
jenniew May 1, 2024
f9ece00
update
jenniew May 1, 2024
bf8aece
update
jenniew May 2, 2024
98789db
update
jenniew May 2, 2024
d459a82
update
jenniew May 2, 2024
51134d4
update
jenniew May 3, 2024
39c104b
update
jenniew May 3, 2024
5d32b59
update
jenniew May 3, 2024
da72111
update
jenniew May 4, 2024
687ba8b
update
jenniew May 4, 2024
8099a2c
update
jenniew May 4, 2024
270ecb8
update
jenniew May 5, 2024
bc847bf
update
jenniew May 5, 2024
0fcaa40
update
jenniew May 5, 2024
26aa194
update
jenniew May 6, 2024
22d0bf6
update
jenniew May 6, 2024
65ea875
update
jenniew May 7, 2024
4f98a38
update
jenniew May 7, 2024
c64ec33
update
jenniew May 7, 2024
9c9e92d
update
jenniew May 8, 2024
4b04c45
update
jenniew May 8, 2024
8638cea
merge
jenniew May 8, 2024
a533ae8
delete
jenniew May 8, 2024
4af1445
update
jenniew May 8, 2024
1f91353
update
jenniew May 8, 2024
0696491
update
jenniew May 8, 2024
6922dc7
update
jenniew May 8, 2024
6417726
update
jenniew May 14, 2024
e30a397
merge
jenniew May 15, 2024
ec2cd5e
update
jenniew May 15, 2024
bc1fec0
merge
jenniew May 17, 2024
55fee3b
Merge branch 'main' of https://github.com/intel-analytics/ipex-llm in…
jenniew May 22, 2024
dcd8115
revert
jenniew May 23, 2024
936fafe
update
jenniew May 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions .github/workflows/llm-harness-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -164,12 +164,6 @@ jobs:
shell: bash
run: |
pip install --upgrade datasets==2.14.6
if [ "${{ matrix.model_name }}" = "Mistral-7B-v0.1" ]; then
pip install --upgrade transformers==4.36
else
pip install --upgrade transformers==4.31
fi


- name: Run harness
shell: bash
Expand Down
9 changes: 2 additions & 7 deletions .github/workflows/llm-ppl-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,16 +144,11 @@ jobs:
echo "MODEL_PATH=${ORIGIN_DIR}/${{ matrix.model_name }}/" >> "$GITHUB_ENV"
MODEL_PATH=${ORIGIN_DIR}/${{ matrix.model_name }}/
wget -r -nH -nc --no-verbose --cut-dirs=1 ${LLM_FTP_URL}/llm/${{ matrix.model_name }} -P ${ORIGIN_DIR}

- name: Upgrade packages
shell: bash
run: |
pip install --upgrade datasets==2.14.6
if [ "${{ matrix.model_name }}" = "Mistral-7B-v0.1" ]; then
pip install --upgrade transformers==4.36
else
pip install --upgrade transformers==4.31
fi
pip install --upgrade datasets==2.14.6

- name: Run perplexity
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/llm-whisper-evaluation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ jobs:
echo "runner=$runner" >> $GITHUB_OUTPUT

llm-whisper-evaluation:
# if: ${{ github.event.schedule || github.event.inputs.artifact == 'llm-whisper-evaluation' || github.event.inputs.artifact == 'all' }} # please comment it for PR tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this change to make this PR more clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

#if: ${{ github.event.schedule || github.event.inputs.artifact == 'llm-whisper-evaluation' || github.event.inputs.artifact == 'all' }} # please comment it for PR tests
needs: [llm-cpp-build, set-matrix] # please uncomment it for PR tests
# needs: [set-matrix] # please comment it for PR tests
strategy:
Expand Down
372 changes: 119 additions & 253 deletions .github/workflows/llm_performance_tests.yml

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions .github/workflows/llm_tests_for_stable_version_on_arc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm_performance_tests.yml"
# - "python/llm/test/benchmark/**"
# - "python/llm/dev/benchmark/all-in-one/**"
pull_request:
branches: [main]
paths:
- ".github/workflows/llm_performance_tests.yml"
- "python/llm/test/benchmark/**"
- "python/llm/dev/benchmark/all-in-one/**"
workflow_dispatch:
workflow_call:

Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/llm_tests_for_stable_version_on_spr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ permissions:

# Controls when the action will run.
on:
# pull_request:
# branches: [main]
# paths:
# - ".github/workflows/llm_performance_tests.yml"
# - "python/llm/test/benchmark/**"
# - "python/llm/dev/benchmark/all-in-one/**"
pull_request:
branches: [main]
paths:
- ".github/workflows/llm_performance_tests.yml"
- "python/llm/test/benchmark/**"
- "python/llm/dev/benchmark/all-in-one/**"
workflow_dispatch:
workflow_call:

Expand Down
19 changes: 9 additions & 10 deletions .github/workflows/llm_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ jobs:
echo "LLAMA_ORIGIN_PATH=${ORIGIN_DIR}/llama-7b-hf" >> "$GITHUB_ENV"
echo "BLOOM_ORIGIN_PATH=${ORIGIN_DIR}/bloom-7b1" >> "$GITHUB_ENV"
echo "ORIGINAL_CHATGLM2_6B_PATH=${ORIGIN_DIR}/chatglm2-6b" >> "$GITHUB_ENV"
echo "ORIGINAL_REPLIT_CODE_PATH=${ORIGIN_DIR}/replit-code-v1-3b" >> "$GITHUB_ENV"
echo "ORIGINAL_CODESHELL_7B_PATH=${ORIGIN_DIR}/CodeShell-7B-Chat" >> "$GITHUB_ENV"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replit-code-v1-3b cannot run with transformers 4.36, so another code generation model is used here instead :)

echo "ORIGINAL_WHISPER_TINY_PATH=${ORIGIN_DIR}/whisper-tiny" >> "$GITHUB_ENV"
echo "MISTRAL_ORIGIN_PATH=${ORIGIN_DIR}/Mistral-7B-v0.1" >> "$GITHUB_ENV"
echo "LLAMA2_7B_ORIGIN_PATH=${ORIGIN_DIR}/Llama-2-7b-chat-hf" >> "$GITHUB_ENV"
Expand Down Expand Up @@ -157,13 +157,13 @@ jobs:
# fi
if [ ! -d $ORIGINAL_CHATGLM2_6B_PATH ]; then
echo "Directory $ORIGINAL_CHATGLM2_6B_PATH not found. Downloading from FTP server..."
echo "wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/chatglm2-6b -P $ORIGIN_DIR"
echo "wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/chatglm2-6b -P $ORIGIN_DIR"
wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/chatglm2-6b -P $ORIGIN_DIR
fi
if [ ! -d $ORIGINAL_REPLIT_CODE_PATH ]; then
echo "Directory $ORIGINAL_REPLIT_CODE_PATH not found. Downloading from FTP server..."
echo "wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/replit-code-v1-3b -P $ORIGIN_DIR"
wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/replit-code-v1-3b -P $ORIGIN_DIR
if [ ! -d $ORIGINAL_CODESHELL_7B_PATH ]; then
echo "Directory $ORIGINAL_CODESHELL_7B_PATH not found. Downloading from FTP server..."
echo "wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/CodeShell-7B-Chat -P $ORIGIN_DIR"
wget -r -nH --no-verbose --cut-dirs=1 $LLM_FTP_URL/llm/CodeShell-7B-Chat -P $ORIGIN_DIR
fi
if [ ! -d $ORIGINAL_WHISPER_TINY_PATH ]; then
echo "Directory $ORIGINAL_WHISPER_TINY_PATH not found. Downloading from FTP server..."
Expand Down Expand Up @@ -226,14 +226,15 @@ jobs:
shell: bash
run: |
pip install llama-index-readers-file llama-index-vector-stores-postgres llama-index-embeddings-huggingface
pip install transformers==4.36.0
pip install transformers==4.36.2
pip install "pydantic>=2.0.0"
bash python/llm/test/run-llm-llamaindex-tests.sh
- name: Run sentence-transformers uninstallation
if: ${{ always() }}
shell: bash
run: |
pip uninstall sentence-transformers -y || true

llm-unit-test-on-arc:
needs: [setup-python-version, llm-cpp-build]
strategy:
Expand Down Expand Up @@ -363,8 +364,6 @@ jobs:
fi
python -m pip install datasets librosa soundfile einops tiktoken transformers_stream_generator
bash python/llm/test/run-llm-inference-tests-gpu.sh
python -m pip install transformers==4.34.0
bash python/llm/test/run-llm-inference-tests-gpu-434.sh

- name: Run LLM example tests
shell: bash
Expand Down Expand Up @@ -410,7 +409,7 @@ jobs:
pip install --pre --upgrade ipex-llm[xpu_2.0] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
source /home/arda/intel/oneapi/setvars.sh
fi
pip install transformers==4.36.0
pip install transformers==4.36.2
pip install "pydantic>=2.0.0"
bash python/llm/test/run-llm-llamaindex-tests-gpu.sh
- name: Run sentence-transformers uninstallation
Expand Down
5 changes: 2 additions & 3 deletions python/llm/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
libs_dir = os.path.join(llm_home, "ipex_llm", "libs")
CONVERT_DEP = ['numpy == 1.26.4', # lastet 2.0.0b1 will cause error
'torch',
'transformers == 4.31.0', 'sentencepiece', 'tokenizers == 0.13.3',
'transformers == 4.36.2', 'sentencepiece', 'tokenizers == 0.15.2',
# TODO: Support accelerate 0.22.0
'accelerate == 0.21.0', 'tabulate']
SERVING_DEP = ['fschat[model_worker, webui] == 0.2.36', 'protobuf']
Expand Down Expand Up @@ -277,10 +277,9 @@ def setup_package():

# Add internal requires for llama-index
llama_index_requires = copy.deepcopy(all_requires)
for exclude_require in ['torch', 'transformers == 4.31.0', 'tokenizers == 0.13.3']:
for exclude_require in ['torch']:
llama_index_requires.remove(exclude_require)
llama_index_requires += ["torch<2.2.0",
"transformers>=4.34.0,<4.39.0",
"sentence-transformers~=2.6.1"]


Expand Down
3 changes: 2 additions & 1 deletion python/llm/src/ipex_llm/optimize.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ def _save_low_bit(self, save_dir, *args, **kwargs):
if isinstance(self, PreTrainedModel):
# We borrowed this method to adapt to Transformer model cases
# as much as possible, and later we may merge these two situations
self.save_pretrained(save_dir)
kwargs['safe_serialization'] = False
self.save_pretrained(save_dir, *args, **kwargs)
else:
# TODO: For the lowbit model still larger than 8GB,
# save it into shards.
Expand Down
9 changes: 5 additions & 4 deletions python/llm/test/benchmark/arc-perf-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@ repo_id:
- 'databricks/dolly-v1-6b'
- 'databricks/dolly-v2-7b'
- 'databricks/dolly-v2-12b'
- 'internlm/internlm-chat-7b-8k'
- 'internlm/internlm-chat-7b'
- 'Qwen/Qwen-7B-Chat'
- 'BAAI/AquilaChat-7B'
- 'baichuan-inc/Baichuan2-7B-Chat'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit'
- 'bigscience/bloomz-7b1'
- 'fnlp/moss-moon-003-sft-4bit'
# - 'fnlp/moss-moon-003-sft-4bit' # moss-moon-003-sft cannot work on transformers 4.34+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tokenizer issue of moss-moon-003-sft model, because moss-moon-003-sft haven't fix the tokenizer issue to compatible with transformers 4.34+, we can not fix it. See the issue: https://github.com/analytics-zoo/nano/issues/1145

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then shall we keep transformers 4.31 in the test as well for this model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether we need to keep such test for transformers 4.31 since ipex-llm would be updated to support transformers 4.36. @jason-dai Do we need to keep tests which only works on transformers 4.31?

- 'mistralai/Mistral-7B-v0.1'
local_model_hub: '/mnt/disk1/models'
warm_up: 1
num_trials: 3
Expand All @@ -31,7 +32,7 @@ test_api:
- "transformer_int4_gpu" # on Intel GPU
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
exclude:
- 'fnlp/moss-moon-003-sft-4bit:1024'
- 'fnlp/moss-moon-003-sft-4bit:2048'
# - 'fnlp/moss-moon-003-sft-4bit:1024'
# - 'fnlp/moss-moon-003-sft-4bit:2048'
- 'baichuan-inc/Baichuan2-13B-Chat-4bit:2048'
- 'bigscience/bloomz-7b1:2048'
16 changes: 0 additions & 16 deletions python/llm/test/benchmark/arc-perf-transformers-434.yaml

This file was deleted.

3 changes: 2 additions & 1 deletion python/llm/test/benchmark/igpu-perf/1024-128.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ repo_id:
- 'WisdomShell/CodeShell-7B-Chat'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'liuhaotian/llava-v1.5-7b'
# - 'liuhaotian/llava-v1.5-7b' # Cannot load using AutoModelForCausalLM in 4.36+
- 'RWKV/rwkv-4-world-7b'
- 'RWKV/rwkv-5-world-7b'
- 'IEITYuan/Yuan2-2B-hf'
- 'mistralai/Mistral-7B-Instruct-v0.1'
local_model_hub: 'path to your local model hub'
warm_up: 1
num_trials: 3
Expand Down
3 changes: 2 additions & 1 deletion python/llm/test/benchmark/igpu-perf/1024-128_int4_fp16.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ repo_id:
- 'WisdomShell/CodeShell-7B-Chat'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'liuhaotian/llava-v1.5-7b'
# - 'liuhaotian/llava-v1.5-7b' # Cannot load using AutoModelForCausalLM in 4.36+
# - 'RWKV/rwkv-4-world-7b'
# - 'RWKV/rwkv-5-world-7b'
- 'IEITYuan/Yuan2-2B-hf'
- 'mistralai/Mistral-7B-Instruct-v0.1'
local_model_hub: 'path to your local model hub'
warm_up: 1
num_trials: 3
Expand Down
3 changes: 2 additions & 1 deletion python/llm/test/benchmark/igpu-perf/1024-128_loadlowbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ repo_id:
- 'WisdomShell/CodeShell-7B-Chat'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'liuhaotian/llava-v1.5-7b'
# - 'liuhaotian/llava-v1.5-7b' # Cannot load using AutoModelForCausalLM in 4.36+
- 'RWKV/rwkv-4-world-7b'
- 'RWKV/rwkv-5-world-7b'
- 'IEITYuan/Yuan2-2B-hf'
- 'mistralai/Mistral-7B-Instruct-v0.1'
local_model_hub: 'path to your local model hub'
warm_up: 1
num_trials: 3
Expand Down
3 changes: 2 additions & 1 deletion python/llm/test/benchmark/igpu-perf/2048-256.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ repo_id:
- 'WisdomShell/CodeShell-7B-Chat'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'liuhaotian/llava-v1.5-7b'
# - 'liuhaotian/llava-v1.5-7b' # Cannot load using AutoModelForCausalLM in 4.36+
- 'RWKV/rwkv-4-world-7b'
- 'RWKV/rwkv-5-world-7b'
- 'IEITYuan/Yuan2-2B-hf'
- 'mistralai/Mistral-7B-Instruct-v0.1'
local_model_hub: 'path to your local model hub'
warm_up: 1
num_trials: 3
Expand Down
3 changes: 2 additions & 1 deletion python/llm/test/benchmark/igpu-perf/32-32.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ repo_id:
- 'WisdomShell/CodeShell-7B-Chat'
- 'tiiuae/falcon-7b-instruct-with-patch'
- 'mosaicml/mpt-7b-chat'
- 'liuhaotian/llava-v1.5-7b'
# - 'liuhaotian/llava-v1.5-7b' # Cannot load using AutoModelForCausalLM in 4.36+
- 'RWKV/rwkv-4-world-7b'
- 'RWKV/rwkv-5-world-7b'
- 'IEITYuan/Yuan2-2B-hf'
- 'mistralai/Mistral-7B-Instruct-v0.1'
local_model_hub: 'path to your local model hub'
warm_up: 3
num_trials: 5
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repo_id:
- 'baichuan-inc/Baichuan2-7B-Chat'
- 'baichuan-inc/Baichuan2-13B-Chat'
- 'Qwen/Qwen-14B-Chat'
local_model_hub: '/models'
local_model_hub: '/mnt/disk1/models'
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repo_id:
- 'baichuan-inc/Baichuan2-7B-Chat'
- 'baichuan-inc/Baichuan2-13B-Chat'
- 'Qwen/Qwen-14B-Chat'
local_model_hub: '/models'
local_model_hub: '/mnt/disk1/models'
warm_up: 3
num_trials: 50
num_beams: 1 # default to greedy search
Expand Down
20 changes: 11 additions & 9 deletions python/llm/test/inference/test_transformers_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,16 @@ def test_transformers_auto_model_int4(self):
print('Prompt:', input_str)
print('Output:', output_str)
print(f'Inference time: {end-st} s')
res = 'Paris' in output_str
res = 'Paris' in output_str
self.assertTrue(res)

def test_transformers_auto_model_for_causal_lm_int4(self):
model_path = os.environ.get('ORIGINAL_REPLIT_CODE_PATH')
model_path = os.environ.get('ORIGINAL_CODESHELL_7B_PATH')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replit-code-v1-3b cannot run with transformers 4.36, so another code generation model is used here instead :)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
input_str = 'def hello():\n print("hello world")\n'
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True)
with torch.inference_mode():

st = time.time()
input_ids = tokenizer.encode(input_str, return_tensors="pt")
output = model.generate(input_ids, do_sample=False, max_new_tokens=32)
Expand All @@ -67,7 +67,7 @@ def test_transformers_auto_model_for_causal_lm_int4(self):
print('Prompt:', input_str)
print('Output:', output_str)
print(f'Inference time: {end-st} s')
res = '\nhello()' in output_str
res = '\nhello()' in output_str
self.assertTrue(res)


Expand All @@ -86,7 +86,7 @@ def test_transformers_auto_model_for_speech_seq2seq_int4(self):
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
end = time.time()
end = time.time()
print('Output:', transcription)
print(f'Inference time: {end-st} s')
res = 'Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.' in transcription[0]
Expand All @@ -108,22 +108,23 @@ def test_transformers_chatglm_for_causallm(self):
print('Prompt:', input_str)
print('Output:', output_str)
print(f'Inference time: {end-st} s')
res = 'Paris' in output_str
res = 'Paris' in output_str
self.assertTrue(res)

@pytest.mark.parametrize('prompt, answer', [
('What is the capital of France?\n\n', 'Paris')
])
@pytest.mark.parametrize('Model, Tokenizer, model_path',[
(AutoModel, AutoTokenizer, os.environ.get('ORIGINAL_CHATGLM2_6B_PATH')),
(AutoModelForCausalLM, AutoTokenizer, os.environ.get('MISTRAL_ORIGIN_PATH')),
])
def test_load_low_bit_completion(Model, Tokenizer, model_path, prompt, answer):
tokenizer = Tokenizer.from_pretrained(model_path, trust_remote_code=True)
model = Model.from_pretrained(model_path,
load_in_4bit=True,
optimize_model=True,
trust_remote_code=True)

with tempfile.TemporaryDirectory() as tempdir:
model.save_low_bit(tempdir)
loaded_model = Model.load_low_bit(tempdir,
Expand All @@ -143,9 +144,10 @@ def test_load_low_bit_completion(Model, Tokenizer, model_path, prompt, answer):
(AutoModelForCausalLM, LlamaTokenizer, os.environ.get('LLAMA_ORIGIN_PATH'), prompt),
(AutoModelForCausalLM, AutoTokenizer, os.environ.get('BLOOM_ORIGIN_PATH'), prompt),
(AutoModel, AutoTokenizer, os.environ.get('ORIGINAL_CHATGLM2_6B_PATH'), prompt),
(AutoModelForCausalLM, AutoTokenizer, os.environ.get('ORIGINAL_REPLIT_CODE_PATH'), prompt)
(AutoModelForCausalLM, AutoTokenizer, os.environ.get('ORIGINAL_CODESHELL_7B_PATH'), prompt),
(AutoModelForCausalLM, AutoTokenizer, os.environ.get('MISTRAL_ORIGIN_PATH'), prompt)
])

def test_optimize_model(Model, Tokenizer, model_path, prompt):
tokenizer = Tokenizer.from_pretrained(model_path, trust_remote_code=True)
input_ids = tokenizer.encode(prompt, return_tensors="pt")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ def replace_forward_hook(module, input, output, layer_name):
if isinstance(t1, torch.Tensor) and isinstance(t2, torch.Tensor):
# 'attn_output' is of type torch.Tensor.
attn_output_diff.append(t1 - t2)
else:
# 'past_key_value'is of type tuple as default.
elif isinstance(t1, tuple) and isinstance(t2, tuple):
# if 'past_key_value'is of type tuple
for i, (t3, t4) in enumerate(zip(t1, t2)):
if model.config.architectures[0] == "ChatGLMModel" and \
hasattr(model.config, 'padded_vocab_size') and \
Expand All @@ -114,6 +114,10 @@ def replace_forward_hook(module, input, output, layer_name):
# We need to narrow it here.
t4 = t4[:, :, 15:17, :]
attn_output_diff.append(t3 - t4)
else:
# if 'past_key_value'is of type Cache, get last layer cache pair (key, value)
attn_output_diff.append(t1[-1][0] - t2[-1][0])
attn_output_diff.append(t1[-1][1] - t2[-1][1])

max_diff_tensor = [torch.max(item).item() for item in attn_output_diff]
print(max_diff_tensor)
Expand Down
Loading
Loading