Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonMatchingSplitsSizesError #13

Open
Zhuqln opened this issue May 10, 2024 · 1 comment
Open

NonMatchingSplitsSizesError #13

Zhuqln opened this issue May 10, 2024 · 1 comment

Comments

@Zhuqln
Copy link

Zhuqln commented May 10, 2024

您好 我在使用您的notebook代码与运行从中文54看k数据集转messages格式的代码中遇到很难定位的错误,似乎是数据集不匹配造成的,还望解答:
报错信息:
NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=132085878, num_examples=11558, shard_lengths=None, dataset_name='chat_haruhi-role_playing'), 'recorded': SplitInfo(name='train', num_bytes=483687, num_examples=44, shard_lengths=None, dataset_name='chat_haruhi-role_playing')}]
溯源:
/content/Haruhi-2-Dev/ChatHaruhi/ChatHaruhi.py in init(self, system_prompt, role_name, role_from_hf, role_from_jsonl, story_db, story_text_folder, llm, embedding, max_len_story, max_len_history, verbose, db_type)
156
157 fname = split_name + '.jsonl'
--> 158 dataset = load_dataset(dataset_name,data_files={'train':fname})
159 datas = dataset["train"]

image

@LC1332
Copy link
Owner

LC1332 commented May 13, 2024

是的 我之前发现huggingface更新datasets后出现了这个bug,最近还没来得及修。目前下载jsonl之后role_from_jsol是可以载入的 等我有空修一下(最近在忙别的QAQ)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants