Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用accelerate和deepspeed进行多卡微调LLM卡住 #1683

Closed
1 task done
Aitejiu opened this issue Nov 30, 2023 · 6 comments
Closed
1 task done

使用accelerate和deepspeed进行多卡微调LLM卡住 #1683

Aitejiu opened this issue Nov 30, 2023 · 6 comments
Labels
good first issue Good for newcomers solved This problem has been already solved

Comments

@Aitejiu
Copy link

Aitejiu commented Nov 30, 2023

Reminder

  • I have read the README and searched the existing issues.

Reproduction

deepspeed

deepspeed --num_gpus 2 --master_port=9901 src/train_bash.py \
    --deepspeed ds_config.json \
    --stage sft \
    --model_name_or_path /home/zhmao/model/Baichuan-13B-chat \
    --do_train \
    --dataset alpaca_gpt4_zh\
    --template baichuan \
    --finetuning_type lora \
    --lora_rank 32 \
    --lora_target all \
    --output_dir /home/zhmao/model/Baichuan-13B-QLoRA \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --preprocessing_num_workers 16 \
    --cutoff_len 1024 \
    --optim paged_adamw_32bit \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --warmup_steps 100 \
    --learning_rate 3e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --quantization_bit 4 \
    --plot_loss \
    --fp16

deepspeed配置参数

{
    "train_micro_batch_size_per_gpu": "auto",
    "zero_allow_untested_optimizer": true,
    "fp16": {
      "enabled": "auto",
      "loss_scale": 0,
      "initial_scale_power": 16, 
      "loss_scale_window": 1000,
      "hysteresis": 2,
      "min_loss_scale": 1
    },  
    "zero_optimization": {
      "stage": 2,
      "allgather_partitions": true,
      "allgather_bucket_size": 5e8,
      "overlap_comm": false,
      "reduce_scatter": true,
      "reduce_bucket_size": 5e8,
      "contiguous_gradients" : true
    }
  }

accelerate

accelerate launch /home/zhmao/.cache/huggingface/accelerate/default_config.yaml \
    src/train_bash.py \
    --stage sft \
    --model_name_or_path /home/zhmao/model/Baichuan-13B-chat \
    --do_train \
    --dataset alpaca_gpt4_zh\
    --template baichuan \
    --finetuning_type lora \
    --lora_rank 32 \
    --lora_target all \
    --output_dir /home/zhmao/model/Baichuan-13B-QLoRA \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --preprocessing_num_workers 16 \
    --cutoff_len 1024 \
    --optim paged_adamw_32bit \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --warmup_steps 100 \
    --learning_rate 3e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --quantization_bit 4 \
    --plot_loss \
    --fp16

例子中使用的Baichuan-13B-chat,但我尝试了ChatGLM2-6B也同样卡住
我已经查询过的issue:
#74
#1651

Expected behavior

期望正确的使用多卡进行加速

System Info

  • transformers version: 4.33.2
  • Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27
  • Python version: 3.9.0
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.0
  • Accelerate version: 0.24.1
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: fp16
    - use_cpu: False
    - debug: False
    - num_processes: 2
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
  • PyTorch version (GPU?): 2.1.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

*GPU:A6000(40G)2

Others

终端所有输出

[2023-11-30 11:24:07,852] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-30 11:24:09,226] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-11-30 11:24:09,226] [INFO] [runner.py:570:main] cmd = /home/zhmao/anaconda3/envs/LLaMa-factory/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=9901 --enable_each_rank_log=None src/train_bash.py --deepspeed ds_config.json --stage sft --model_name_or_path /home/zhmao/model/Baichuan-13B-chat --do_train --dataset alpaca_gpt4_zh --template baichuan --finetuning_type lora --lora_rank 32 --lora_target all --output_dir /home/zhmao/model/Baichuan-13B-QLoRA --per_device_train_batch_size 4 --gradient_accumulation_steps 8 --preprocessing_num_workers 16 --cutoff_len 1024 --optim paged_adamw_32bit --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --eval_steps 100 --warmup_steps 100 --learning_rate 3e-5 --max_grad_norm 0.5 --num_train_epochs 2.0 --quantization_bit 4 --plot_loss --fp16
[2023-11-30 11:24:11,215] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-30 11:24:12,559] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-11-30 11:24:12,559] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-11-30 11:24:12,559] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-11-30 11:24:12,559] [INFO] [launch.py:163:main] dist_world_size=2
[2023-11-30 11:24:12,559] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-11-30 11:24:16,369] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-30 11:24:16,393] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/zhmao/anaconda3/envs/LLaMa-factory/lib/python3.9/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead.
  warnings.warn(
/home/zhmao/anaconda3/envs/LLaMa-factory/lib/python3.9/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead.
  warnings.warn(
[2023-11-30 11:24:18,123] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-30 11:24:18,315] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-30 11:24:18,315] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
11/30/2023 11:24:19 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training.
11/30/2023 11:24:19 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
[INFO|training_args.py:1332] 2023-11-30 11:24:19,136 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1764] 2023-11-30 11:24:19,136 >> PyTorch: setting up devices
/home/zhmao/anaconda3/envs/LLaMa-factory/lib/python3.9/site-packages/transformers/training_args.py:1677: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
11/30/2023 11:24:19 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1
  distributed training: True, compute dtype: torch.float16
11/30/2023 11:24:19 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=ds_config.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=100.0,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=3e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/home/zhmao/model/Baichuan-13B-QLoRA/runs/Nov30_11-24-18_qlb-AS-4124GS-TNR,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=0.5,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=2.0,
optim=paged_adamw_32bit,
optim_args=None,
output_dir=/home/zhmao/model/Baichuan-13B-QLoRA,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=/home/zhmao/model/Baichuan-13B-QLoRA,
save_on_each_node=False,
save_safetensors=False,
save_steps=100,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=100,
weight_decay=0.0,
)
11/30/2023 11:24:19 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
11/30/2023 11:24:19 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training.
11/30/2023 11:24:19 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
/home/zhmao/anaconda3/envs/LLaMa-factory/lib/python3.9/site-packages/transformers/training_args.py:1677: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
11/30/2023 11:24:19 - INFO - llmtuner.model.parser - Process rank: 1, device: cuda:1, n_gpu: 1
  distributed training: True, compute dtype: torch.float16
11/30/2023 11:24:19 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=ds_config.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=100.0,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=3e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/home/zhmao/model/Baichuan-13B-QLoRA/runs/Nov30_11-24-18_qlb-AS-4124GS-TNR,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=0.5,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=2.0,
optim=paged_adamw_32bit,
optim_args=None,
output_dir=/home/zhmao/model/Baichuan-13B-QLoRA,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=/home/zhmao/model/Baichuan-13B-QLoRA,
save_on_each_node=False,
save_safetensors=False,
save_steps=100,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=100,
weight_decay=0.0,
)
11/30/2023 11:24:19 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
Using custom data configuration default-4f195a63697fe826
Loading Dataset Infos from /home/zhmao/anaconda3/envs/LLaMa-factory/lib/python3.9/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:1850] 2023-11-30 11:24:20,331 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:1850] 2023-11-30 11:24:20,331 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1850] 2023-11-30 11:24:20,331 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1850] 2023-11-30 11:24:20,331 >> loading file tokenizer_config.json
[INFO|configuration_utils.py:713] 2023-11-30 11:24:20,356 >> loading configuration file /home/zhmao/model/Baichuan-13B-chat/config.json
[INFO|configuration_utils.py:713] 2023-11-30 11:24:20,357 >> loading configuration file /home/zhmao/model/Baichuan-13B-chat/config.json
[INFO|configuration_utils.py:775] 2023-11-30 11:24:20,358 >> Model config BaichuanConfig {
  "_from_model_config": true,
  "_name_or_path": "/home/zhmao/model/Baichuan-13B-chat",
  "architectures": [
    "BaichuanForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_baichuan.BaichuanConfig",
    "AutoModelForCausalLM": "modeling_baichuan.BaichuanForCausalLM"
  },
  "bos_token_id": 1,
  "eos_token_id": 2,
  "gradient_checkpointing": [
    false
  ],
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13696,
  "model_max_length": 4096,
  "model_type": "baichuan",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.33.2",
  "use_cache": true,
  "vocab_size": 64000
}

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
11/30/2023 11:24:20 - INFO - llmtuner.model.loader - Quantizing model to 4 bit.
[INFO|modeling_utils.py:2866] 2023-11-30 11:24:20,382 >> loading weights file /home/zhmao/model/Baichuan-13B-chat/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1200] 2023-11-30 11:24:20,382 >> Instantiating BaichuanForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:768] 2023-11-30 11:24:20,382 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.33.2"
}

[INFO|modeling_utils.py:2983] 2023-11-30 11:24:20,415 >> Detected 4-bit loading: activating 4-bit loading for this model
11/30/2023 11:24:20 - INFO - llmtuner.model.loader - Quantizing model to 4 bit.
Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:14<00:00,  4.77s/it]
11/30/2023 11:24:35 - INFO - llmtuner.model.utils - Gradient checkpointing enabled.
11/30/2023 11:24:35 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
11/30/2023 11:24:35 - INFO - llmtuner.model.utils - Found linear modules: down_proj,W_pack,up_proj,o_proj,gate_proj
11/30/2023 11:24:36 - INFO - llmtuner.model.loader - trainable params: 111575040 || all params: 13376476160 || trainable%: 0.8341
Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:15<00:00,  5.33s/it]
[INFO|modeling_utils.py:3655] 2023-11-30 11:24:36,546 >> All model checkpoint weights were used when initializing BaichuanForCausalLM.

[INFO|modeling_utils.py:3663] 2023-11-30 11:24:36,546 >> All the weights of BaichuanForCausalLM were initialized from the model checkpoint at /home/zhmao/model/Baichuan-13B-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BaichuanForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2023-11-30 11:24:36,550 >> loading configuration file /home/zhmao/model/Baichuan-13B-chat/generation_config.json
[INFO|configuration_utils.py:768] 2023-11-30 11:24:36,550 >> Generate config GenerationConfig {
  "assistant_token_id": 196,
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_new_tokens": 2048,
  "pad_token_id": 0,
  "repetition_penalty": 1.1,
  "temperature": 0.3,
  "top_k": 5,
  "top_p": 0.85,
  "transformers_version": "4.33.2",
  "user_token_id": 195
}

11/30/2023 11:24:36 - INFO - llmtuner.model.utils - Gradient checkpointing enabled.
11/30/2023 11:24:36 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
11/30/2023 11:24:36 - INFO - llmtuner.model.utils - Found linear modules: down_proj,up_proj,o_proj,gate_proj,W_pack
11/30/2023 11:24:37 - INFO - llmtuner.model.loader - trainable params: 111575040 || all params: 13376476160 || trainable%: 0.8341
[INFO|tokenization_utils_base.py:926] 2023-11-30 11:24:37,912 >> Assigning [] to the additional_special_tokens key of the tokenizer
Process #0 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00000_of_00016.arrow
Process #1 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00001_of_00016.arrow
Process #2 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00002_of_00016.arrow
Process #3 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00003_of_00016.arrow
Process #4 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00004_of_00016.arrow
Process #5 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00005_of_00016.arrow
Process #6 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00006_of_00016.arrow
Process #7 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00007_of_00016.arrow
Process #8 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00008_of_00016.arrow
Process #9 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00009_of_00016.arrow
Process #10 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00010_of_00016.arrow
Process #11 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00011_of_00016.arrow
Process #12 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00012_of_00016.arrow
Process #13 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00013_of_00016.arrow
Process #14 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00014_of_00016.arrow
Process #15 will write at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_00015_of_00016.arrow
Loading cached processed dataset at /home/zhmao/.cache/huggingface/datasets/json/default-4f195a63697fe826/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-87dc526b91464992_*_of_00016.arrow
Concatenating 16 shards
input_ids:
[195, 31106, 4550, 19463, 7841, 7868, 73, 196, 31106, 4567, 31161, 4550, 19463, 7841, 7868, 77, 5, 5, 53, 79, 31106, 4550, 3606, 2148, 73, 3526, 31345, 11886, 31135, 3606, 4467, 72, 31248, 32188, 31583, 76, 21332, 31399, 17268, 72, 31196, 6520, 28165, 2337, 72, 7552, 12421, 6029, 72, 31404, 20387, 5972, 16573, 73, 5, 5, 54, 79, 31106, 24691, 9945, 73, 3526, 11164, 12420, 31135, 11748, 76, 11603, 76, 31233, 32570, 31368, 31188, 12019, 13443, 31664, 31135, 18085, 6768, 72, 6076, 31229, 32242, 76, 31229, 12019, 31188, 10523, 6186, 72, 31187, 4550, 19463, 9945, 6269, 73, 5, 5, 55, 79, 31106, 11923, 15932, 73, 11923, 31209, 7776, 2337, 31475, 31262, 2462, 72, 17951, 3526, 31363, 6196, 31106, 59, 31136, 60, 31106, 4237, 31135, 11923, 73, 9636, 11923, 20387, 17832, 6550, 72, 6520, 3606, 6691, 72, 31404, 3806, 3300, 22645, 9684, 31258, 73, 2]
inputs:
 <reserved_102> 保持健康的三个提示。<reserved_103> 以下是保持健康的三个提示:

1. 保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。

2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。

3. 睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。</s>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, 31106, 4567, 31161, 4550, 19463, 7841, 7868, 77, 5, 5, 53, 79, 31106, 4550, 3606, 2148, 73, 3526, 31345, 11886, 31135, 3606, 4467, 72, 31248, 32188, 31583, 76, 21332, 31399, 17268, 72, 31196, 6520, 28165, 2337, 72, 7552, 12421, 6029, 72, 31404, 20387, 5972, 16573, 73, 5, 5, 54, 79, 31106, 24691, 9945, 73, 3526, 11164, 12420, 31135, 11748, 76, 11603, 76, 31233, 32570, 31368, 31188, 12019, 13443, 31664, 31135, 18085, 6768, 72, 6076, 31229, 32242, 76, 31229, 12019, 31188, 10523, 6186, 72, 31187, 4550, 19463, 9945, 6269, 73, 5, 5, 55, 79, 31106, 11923, 15932, 73, 11923, 31209, 7776, 2337, 31475, 31262, 2462, 72, 17951, 3526, 31363, 6196, 31106, 59, 31136, 60, 31106, 4237, 31135, 11923, 73, 9636, 11923, 20387, 17832, 6550, 72, 6520, 3606, 6691, 72, 31404, 3806, 3300, 22645, 9684, 31258, 73, 2]
labels:
 以下是保持健康的三个提示:

1. 保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。

2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。

3. 睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。</s>
@hiyouga
Copy link
Owner

hiyouga commented Nov 30, 2023

#1651 (comment)

@hiyouga hiyouga added the pending This problem is yet to be addressed label Nov 30, 2023
@Aitejiu
Copy link
Author

Aitejiu commented Dec 1, 2023

#1651 (comment)

我尝试过了,deepspeed都试过了,最后是在微调前加入一条指令。
export NCCL_P2P_LEVEL=NVL

@Aitejiu Aitejiu closed this as completed Dec 1, 2023
@hiyouga hiyouga added good first issue Good for newcomers solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 1, 2023
@Aitejiu
Copy link
Author

Aitejiu commented Dec 2, 2023

#1651 (comment)

我尝试过了,deepspeed都试过了,最后是在微调前加入一条指令。 export NCCL_P2P_LEVEL=NVL

这里修改的指令,是将GPU之间的传输带宽等级提高。
问题的原因,我猜测是因为GPU之间的传输带宽等级太低,进程一直挂起。
如果想彻底解决,可以尝试
huggingface/accelerate#934
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#pci-access-control-services-acs

@HUAFOR
Copy link

HUAFOR commented Dec 10, 2023

export NCCL_P2P_LEVEL=NVL是直接在运行前加上这行命令就可以吗?

@Len-Li
Copy link

Len-Li commented Dec 23, 2023

export NCCL_P2P_LEVEL=NVL是直接在运行前加上这行命令就可以吗?

我加上这一句解决了使用nccl时卡死的问题,加上就可以多卡微调了

@lonelydancer
Copy link

请问没有nvlink,数据量比较大是不是就卡死了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

5 participants