[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646

Andy666G · 2023-05-31T10:18:54Z

Describe the bug
When tranining llama 13B(https://github.com/ymcui/Chinese-LLaMA-Alpaca), I observed it cannot free parameter memory using ZeRO3 + Offload strategy parameter in pytorch1.9, but parameter memory can be freed in pytorch1.13 with the same deepspeed strategy. This issue(#3002) cannot solve this bug.

To Reproduce
deepspeed0.9.2 + pytorch1.9 + peft 0.3 + transformers4.28.1
ds_config

  1 {
  2     "zero_optimization": {
  3         "stage": 3,
  4         "offload_optimizer": {
  5             "device": "cpu",
  6             "pin_memory": true
  7         },
  8         "offload_param": {
  9             "device": "cpu",
 10             "pin_memory": true
 11         },
 12         "overlap_comm": true,
 13         "contiguous_gradients": true,
 14         "sub_group_size": 1e9,
 15         "reduce_bucket_size": "auto",
 16         "stage3_prefetch_bucket_size": "auto",
 17         "stage3_param_persistence_threshold": "auto",
 18         "stage3_max_live_parameters": 1e9,
 19         "stage3_max_reuse_distance": 1e9,
 20         "stage3_gather_16bit_weights_on_model_save": true                                                                                                                                                
 21     },
 22   "train_batch_size": 1,
 23   "train_micro_batch_size_per_gpu": 1,
 24   "fp16": {
 25         "enabled": "auto",
 26         "loss_scale": 0,
 27         "loss_scale_window": 1000,
 28         "initial_scale_power": 16,
 29         "hysteresis": 2,
 30         "min_loss_scale": 1
 31   },
 32    "optimizer": {
 33        "type": "Adam",
 34        "params": {
 35          "lr": "auto",
 36          "betas": "auto",
 37          "eps": "auto",
 38          "weight_decay": "auto"
 39        }
 40    }
 41 }  
Run llama 13B model.

Expected behavior
A clear and concise description of what you expected to happen.
Pytorch 1.13, deepspeed 0.9.2

Pytorch 1.9, deepspeed 0.9.2

ds_report output
Please run ds_report to give us details about your setup.

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch']
torch version .................... 1.9.0a0+c3d40fd
deepspeed install path ........... ['/opt/conda/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.9.2, unknown, unknown
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.9, cuda 11.3

Screenshots
when calling post_forward_hook, parameters can be freed in pytorch1.13, but cannot be freed in pytorch1.9
Pytorch1.13

Pytorch1.9

System info (please complete the following information):

OS: [e.g. Ubuntu 18.04]
GPU count and types [e.g. one machines with x1 A100 each]
Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]
Python version 3.8
Any other relevant info about your setup

Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else?
deepspeed launcher
Docker context
Are you using a specific docker image that you can share?
NGC22.07(Pytorch1.13) and NGC21.06(Pytorch1.9)
Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

jomayeri · 2023-06-06T17:09:05Z

Hi @Andy666G can you provide a script that reproduces this error?

Andy666G · 2023-06-07T09:08:35Z

Sure, here is a script @jomayeri
The pretrained_model is vicuna-13b, and any “.txt” file can be a dataset.
I have provided the deepspeed config above.

# export CUDA_VISIBLE_DEVICES=0
lora_rank=8
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
#modules_to_save="embed_tokens,lm_head"
# modules_to_save="lm_head"
modules_to_save=""
lora_dropout=0.1
pretrained_model="models/vicuna-13b-all-v1.1/"
chinese_tokenizer_path="models/vicuna-13b-all-v1.1/tokenizer.model"
dataset_dir="/doc"
data_cache="$PWD/cache"
per_device_batch_size=1 # 1024 ,from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
training_steps=7000 # 6000, from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
lr=2.34e-06     # from https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82
gradient_accumulation_steps=1
output_dir="output"
max_train_samples=${per_device_batch_size}
max_eval_samples=${per_device_batch_size}

#TODO: deepspeed
deepspeed --include localhost:0 --master_port 12688 scripts/run_clm_pt_with_peft.py \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir $data_cache \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_batch_size} \
    --per_device_eval_batch_size ${per_device_batch_size} \
    --do_train \
    --debug_mode \
    --torch_dtype float16 \
    --seed $RANDOM \
    --max_steps ${training_steps} \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 1000 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --block_size 512 \
    --output_dir ${output_dir} \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --deepspeed deepspeed_config.json \
    --fp16  \
    --overwrite_output_dir \

GuWei007 · 2023-06-13T08:08:36Z

I had the same problem and was very confused

jomayeri · 2023-07-10T20:50:13Z

@Andy666G Sorry I cannot repro this issue. In the description you say "when calling post_forward_hook" is this a hook you added? Have you raised the issue with Pytorch?

jomayeri · 2023-08-11T03:53:51Z

Closing for now, please reopen if needed.

Andy666G added bug Something isn't working training labels May 31, 2023

jomayeri self-assigned this Jun 2, 2023

jomayeri closed this as completed Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646

Andy666G commented May 31, 2023 •

edited

Loading

jomayeri commented Jun 6, 2023

Andy666G commented Jun 7, 2023

GuWei007 commented Jun 13, 2023

jomayeri commented Jul 10, 2023 •

edited

Loading

jomayeri commented Aug 11, 2023

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9 #3646

Comments

Andy666G commented May 31, 2023 • edited Loading

jomayeri commented Jun 6, 2023

Andy666G commented Jun 7, 2023

GuWei007 commented Jun 13, 2023

jomayeri commented Jul 10, 2023 • edited Loading

jomayeri commented Aug 11, 2023

Andy666G commented May 31, 2023 •

edited

Loading

jomayeri commented Jul 10, 2023 •

edited

Loading