Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.distributed.elastic.multiprocessing.errors.ChildFailedError #7

Closed
WangRongsheng opened this issue Jun 4, 2023 · 3 comments
Closed
Labels
solved This problem has been already solved

Comments

@WangRongsheng
Copy link

训练指令:

accelerate launch src/train_sft.py \
    --model_name_or_path llama-hf/llama-13b-hf \
    --do_train \
    --dataset ChangChunTeng \
    --finetuning_type lora \
    --output_dir CCT/sft \
    --overwrite_cache \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16
@WangRongsheng
Copy link
Author

Running tokenizer on dataset:   0%|                                                                | 0/226042 [00:00<?, ? examples/s]06/04/2023 11:06:14 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/wangrongsheng___json/wangrongsheng--ChangChunTeng-220k-d576ed39544bf546/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-7900d18e31cbb541.arrow
Traceback (most recent call last):
  File "/tmp/CCT/src/train_sft.py", line 97, in <module>
    main()
  File "/tmp/CCT/src/train_sft.py", line 27, in main
    dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft")
  File "/tmp/CCT/src/utils/common.py", line 475, in preprocess_data
    print_supervised_dataset_example(dataset[0])
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2778, in __getitem__
    return self._getitem(key)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2762, in _getitem
    pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 578, in query_table
    _check_valid_index_key(key, size)
  File "/root/miniconda3/envs/xray/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 521, in _check_valid_index_key
    raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0

@WangRongsheng
Copy link
Author

请遵循以下数据格式,并且prompt不能为空:

"数据集名称": {
    "hf_hub_url": "HuggingFace上的项目地址(若指定,则忽略下列三个参数)",
    "script_url": "包含数据加载脚本的本地文件夹名称(若指定,则忽略下列两个参数)",
    "file_name": "该目录下数据集文件的名称(若上述参数未指定,则此项必需)",
    "file_sha1": "数据集文件的SHA-1哈希值(可选)",
    "columns": {
        "prompt": "数据集代表提示词的表头名称(默认:instruction)",
        "query": "数据集代表请求的表头名称(默认:input)",
        "response": "数据集代表回答的表头名称(默认:output)",
        "history": "数据集代表历史对话的表头名称(默认:None)"
    }
}

@hiyouga hiyouga added the solved This problem has been already solved label Jun 4, 2023
@hiyouga
Copy link
Owner

hiyouga commented Jun 4, 2023

这种问题通常是 dataset_info.json 中的 columns 定义有误,需要检查数据集定义。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants