non-singleton dimension errors when run Deepspeed-AutoTP #11392

jianweimama · 2024-06-21T09:13:44Z

HOST安装的步骤
conda create -n llm python=3.11
conda activate llm

below command will install intel_extension_for_pytorch==2.1.10+xpu as default

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.37.0
pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

configures OneAPI environment variables

source /opt/intel/oneapi/setvars.sh
pip install git+https://github.com/microsoft/DeepSpeed.git@ed8aed5
pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@0eb734b
pip install mpi4py
conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc

安装的pip包
(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# pip3 freeze
accelerate==0.23.0
annotated-types==0.7.0
bigdl-core-xe-21==2.5.0b20240620
bigdl-core-xe-addons-21==2.5.0b20240620
bigdl-core-xe-batch-21==2.5.0b20240620
certifi==2024.6.2
charset-normalizer==3.3.2
deepspeed @ git+https://github.com/microsoft/DeepSpeed.git@ed8aed5703d97b6e52d0fca3e4be285e21c005f2
filelock==3.15.3
fsspec==2024.6.0
hjson==3.1.0
huggingface-hub==0.23.4
idna==3.7
intel-cmplr-lib-ur==2024.2.0
intel-extension-for-pytorch==2.1.10+xpu
intel-openmp==2024.2.0
intel_extension_for_deepspeed @ file:///root/intel-extension-for-deepspeed
ipex-llm==2.1.0b20240620
Jinja2==3.1.4
MarkupSafe==2.1.5
mpi4py==3.1.6
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
oneccl-bind-pt==2.1.100+xpu
packaging==24.1
pillow==10.3.0
protobuf==5.27.1
psutil==6.0.0
py-cpuinfo==9.0.0
pydantic==2.7.4
pydantic_core==2.18.4
pynvml==11.5.0
PyYAML==6.0.2rc1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
sentencepiece==0.2.0
sympy==1.13.0rc2
tabulate==0.9.0
tokenizers==0.15.2
torch==2.1.0a0+cxx11.abi
torchvision==0.16.0a0+cxx11.abi
tqdm==4.66.4
transformers==4.37.0
typing_extensions==4.12.2
urllib3==2.2.2

(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# bash run_qwen_14b_arc_2_card.sh

:: initializing oneAPI environment ...
run_qwen_14b_arc_2_card.sh: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments: --force
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: oneAPI environment initialized ::

[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
[0] warn(
[1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
[1] warn(
[0] [2024-06-21 23:13:11,872] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[1] [2024-06-21 23:13:11,951] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[0] [2024-06-21 23:13:12,241] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
[1] [2024-06-21 23:13:12,325] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
Loading checkpoint shards: 100%|██████████| 8/8 [00:16<00:00, 2.04s/it][1]
[1] [2024-06-21 23:13:29,421] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[1] [2024-06-21 23:13:29,422] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Loading checkpoint shards: 100%|██████████| 8/8 [00:17<00:00, 2.21s/it][0]
[0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[1] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[1] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[1] Building extension module deepspeed_ccl_comm...
[1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1] ninja: no work to do.
[1] Loading extension module deepspeed_ccl_comm...
[0] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[0] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[0] Building extension module deepspeed_ccl_comm...
[0] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[0] ninja: no work to do.
[0] Loading extension module deepspeed_ccl_comm...
[0] My guessed rank = 0
[0] 2024:06:21-23:13:40:(1676750) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
[1] My guessed rank = 1
[1] 2024:06:21-23:13:40:(1676751) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
[0] Time to load deepspeed_ccl_comm op: 0.11093568801879883 seconds
[0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] Time to load deepspeed_ccl_comm op: 0.10797476768493652 seconds
[1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fa20a3d3d90>
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7b8ba0ce0510>
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[1] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.182.230, master_port=29500
[0] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.182.230, master_port=29500
[0] [2024-06-21 23:13:41,485] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[0] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
[1] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
[1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
[1] warnings.warn("Initializing zero-element tensors is a no-op")
[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
[0] warnings.warn("Initializing zero-element tensors is a no-op")
[1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])]
[1] Traceback (most recent call last):
[1] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in
[1] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[1] model = ggml_convert_low_bit(model,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[1] model = _optimize_pre(model)
[1] ^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[1] model.apply(padding_mlp)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] [Previous line repeated 1 more time]
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[1] fn(self)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[1] new_gate_weight[:intermediate_size, :] = gate_weight
[1] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
[1] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120]
[0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])]
[0] Traceback (most recent call last):
[0] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in
[0] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[0] model = ggml_convert_low_bit(model,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[0] model = _optimize_pre(model)
[0] ^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[0] model.apply(padding_mlp)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] [Previous line repeated 1 more time]
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[0] fn(self)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[0] new_gate_weight[:intermediate_size, :] = gate_weight
[0] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
[0] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120]
[0] free(): invalid pointer
[0]
[0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)[0]
[0] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N]
[0] Registry and code: 13 MB[0]
[0] Command: python [0] deepspee[0] d_autot[0] p.py --[0] repo-id[0] -or-mode[0] l-path[0] /root[0] /ipex-[0] llm/Qw[0] en1.5-[0] 14B-Chat[0] --low[0] -bit sy[0] m_int4[0]
[0] Uptime: 35.240733 s
[1] free(): invalid size
[1]
[1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
[1] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N]
[1] Registry and code: 13 MB
[1] Command: python deepspeed_autotp.py --repo-id-or-model-path /root/ipex-llm/Qwen1.5-14B-Chat --low-bit sym_int4[1]
[1] Uptime: 35.150173 s

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1676750 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 1676751 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)

The text was updated successfully, but these errors were encountered:

jianweimama · 2024-06-21T09:33:46Z

After rollback BigDL and IPEX-LLM to version 0619, this problem disappeared.

plusbang · 2024-06-24T05:24:06Z

Hi, @jianweimama , we will inform you immediately once the bug is fixed.

plusbang · 2024-06-25T05:53:00Z

Hi, @jianweimama , this bug is fixed and you could try the new nightly version (later than 2.1.0b20240625) of ipex-llm.

jianweimama · 2024-06-27T03:30:08Z

thanks a lot, will try it soon.

qiuxin2012 assigned plusbang Jun 24, 2024

qiuxin2012 added the user issue label Jun 24, 2024

plusbang mentioned this issue Jun 25, 2024

Fix shape error when run qwen1.5-14b using deepspeed autotp #11420

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non-singleton dimension errors when run Deepspeed-AutoTP #11392

non-singleton dimension errors when run Deepspeed-AutoTP #11392

jianweimama commented Jun 21, 2024

jianweimama commented Jun 21, 2024

plusbang commented Jun 24, 2024

plusbang commented Jun 25, 2024

jianweimama commented Jun 27, 2024

non-singleton dimension errors when run Deepspeed-AutoTP #11392

non-singleton dimension errors when run Deepspeed-AutoTP #11392

Comments

jianweimama commented Jun 21, 2024

below command will install intel_extension_for_pytorch==2.1.10+xpu as default

configures OneAPI environment variables

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 1676750 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 1676751 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)

jianweimama commented Jun 21, 2024

plusbang commented Jun 24, 2024

plusbang commented Jun 25, 2024

jianweimama commented Jun 27, 2024

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1676750 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 1676751 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)