Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-singleton dimension errors when run Deepspeed-AutoTP #11392

Open
jianweimama opened this issue Jun 21, 2024 · 4 comments
Open

non-singleton dimension errors when run Deepspeed-AutoTP #11392

jianweimama opened this issue Jun 21, 2024 · 4 comments
Assignees

Comments

@jianweimama
Copy link

HOST安装的步骤
conda create -n llm python=3.11
conda activate llm

below command will install intel_extension_for_pytorch==2.1.10+xpu as default

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.37.0
pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

configures OneAPI environment variables

source /opt/intel/oneapi/setvars.sh
pip install git+https://github.com/microsoft/DeepSpeed.git@ed8aed5
pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@0eb734b
pip install mpi4py
conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc

安装的pip包
(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# pip3 freeze
accelerate==0.23.0
annotated-types==0.7.0
bigdl-core-xe-21==2.5.0b20240620
bigdl-core-xe-addons-21==2.5.0b20240620
bigdl-core-xe-batch-21==2.5.0b20240620
certifi==2024.6.2
charset-normalizer==3.3.2
deepspeed @ git+https://github.com/microsoft/DeepSpeed.git@ed8aed5703d97b6e52d0fca3e4be285e21c005f2
filelock==3.15.3
fsspec==2024.6.0
hjson==3.1.0
huggingface-hub==0.23.4
idna==3.7
intel-cmplr-lib-ur==2024.2.0
intel-extension-for-pytorch==2.1.10+xpu
intel-openmp==2024.2.0
intel_extension_for_deepspeed @ file:///root/intel-extension-for-deepspeed
ipex-llm==2.1.0b20240620
Jinja2==3.1.4
MarkupSafe==2.1.5
mpi4py==3.1.6
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
oneccl-bind-pt==2.1.100+xpu
packaging==24.1
pillow==10.3.0
protobuf==5.27.1
psutil==6.0.0
py-cpuinfo==9.0.0
pydantic==2.7.4
pydantic_core==2.18.4
pynvml==11.5.0
PyYAML==6.0.2rc1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
sentencepiece==0.2.0
sympy==1.13.0rc2
tabulate==0.9.0
tokenizers==0.15.2
torch==2.1.0a0+cxx11.abi
torchvision==0.16.0a0+cxx11.abi
tqdm==4.66.4
transformers==4.37.0
typing_extensions==4.12.2
urllib3==2.2.2

(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# bash run_qwen_14b_arc_2_card.sh

:: initializing oneAPI environment ...
run_qwen_14b_arc_2_card.sh: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments: --force
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: oneAPI environment initialized ::

[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
[0] warn(
[1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
[1] warn(
[0] [2024-06-21 23:13:11,872] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[1] [2024-06-21 23:13:11,951] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[0] [2024-06-21 23:13:12,241] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
[1] [2024-06-21 23:13:12,325] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified)
Loading checkpoint shards: 100%|██████████| 8/8 [00:16<00:00, 2.04s/it][1]
[1] [2024-06-21 23:13:29,421] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[1] [2024-06-21 23:13:29,422] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Loading checkpoint shards: 100%|██████████| 8/8 [00:17<00:00, 2.21s/it][0]
[0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD
[0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[1] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[1] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[1] Building extension module deepspeed_ccl_comm...
[1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1] ninja: no work to do.
[1] Loading extension module deepspeed_ccl_comm...
[0] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root...
[0] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja...
[0] Building extension module deepspeed_ccl_comm...
[0] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[0] ninja: no work to do.
[0] Loading extension module deepspeed_ccl_comm...
[0] My guessed rank = 0
[0] 2024:06:21-23:13:40:(1676750) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
[1] My guessed rank = 1
[1] 2024:06:21-23:13:40:(1676751) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
[0] Time to load deepspeed_ccl_comm op: 0.11093568801879883 seconds
[0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] Time to load deepspeed_ccl_comm op: 0.10797476768493652 seconds
[1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fa20a3d3d90>
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7b8ba0ce0510>
[1] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[0] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[1] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.182.230, master_port=29500
[0] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.182.230, master_port=29500
[0] [2024-06-21 23:13:41,485] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[0] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
[1] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
[1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
[1] warnings.warn("Initializing zero-element tensors is a no-op")
[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
[0] warnings.warn("Initializing zero-element tensors is a no-op")
[1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])]
[1] Traceback (most recent call last):
[1] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in
[1] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[1] model = ggml_convert_low_bit(model,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[1] model = _optimize_pre(model)
[1] ^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[1] model.apply(padding_mlp)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] [Previous line repeated 1 more time]
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[1] fn(self)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[1] new_gate_weight[:intermediate_size, :] = gate_weight
[1] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
[1] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120]
[0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])]
[0] Traceback (most recent call last):
[0] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in
[0] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[0] model = ggml_convert_low_bit(model,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[0] model = _optimize_pre(model)
[0] ^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[0] model.apply(padding_mlp)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] [Previous line repeated 1 more time]
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[0] fn(self)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[0] new_gate_weight[:intermediate_size, :] = gate_weight
[0] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
[0] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120]
[0] free(): invalid pointer
[0]
[0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)[0]
[0] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N]
[0] Registry and code: 13 MB[0]
[0] Command: python [0] deepspee[0] d_autot[0] p.py --[0] repo-id[0] -or-mode[0] l-path[0] /root[0] /ipex-[0] llm/Qw[0] en1.5-[0] 14B-Chat[0] --low[0] -bit sy[0] m_int4[0]
[0] Uptime: 35.240733 s
[1] free(): invalid size
[1]
[1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
[1] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N]
[1] Registry and code: 13 MB
[1] Command: python deepspeed_autotp.py --repo-id-or-model-path /root/ipex-llm/Qwen1.5-14B-Chat --low-bit sym_int4[1]
[1] Uptime: 35.150173 s

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1676750 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 1676751 RUNNING AT test-server
= KILLED BY SIGNAL: 6 (Aborted)

@jianweimama
Copy link
Author

After rollback BigDL and IPEX-LLM to version 0619, this problem disappeared.

@plusbang
Copy link
Contributor

Hi, @jianweimama , we will inform you immediately once the bug is fixed.

@plusbang
Copy link
Contributor

Hi, @jianweimama , this bug is fixed and you could try the new nightly version (later than 2.1.0b20240625) of ipex-llm.

@jianweimama
Copy link
Author

thanks a lot, will try it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants