We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好 我在使用您的notebook代码与运行从中文54看k数据集转messages格式的代码中遇到很难定位的错误,似乎是数据集不匹配造成的,还望解答: 报错信息: NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=132085878, num_examples=11558, shard_lengths=None, dataset_name='chat_haruhi-role_playing'), 'recorded': SplitInfo(name='train', num_bytes=483687, num_examples=44, shard_lengths=None, dataset_name='chat_haruhi-role_playing')}] 溯源: /content/Haruhi-2-Dev/ChatHaruhi/ChatHaruhi.py in init(self, system_prompt, role_name, role_from_hf, role_from_jsonl, story_db, story_text_folder, llm, embedding, max_len_story, max_len_history, verbose, db_type) 156 157 fname = split_name + '.jsonl' --> 158 dataset = load_dataset(dataset_name,data_files={'train':fname}) 159 datas = dataset["train"]
The text was updated successfully, but these errors were encountered:
是的 我之前发现huggingface更新datasets后出现了这个bug,最近还没来得及修。目前下载jsonl之后role_from_jsol是可以载入的 等我有空修一下(最近在忙别的QAQ)
Sorry, something went wrong.
No branches or pull requests
您好 我在使用您的notebook代码与运行从中文54看k数据集转messages格式的代码中遇到很难定位的错误,似乎是数据集不匹配造成的,还望解答:
报错信息:
NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=132085878, num_examples=11558, shard_lengths=None, dataset_name='chat_haruhi-role_playing'), 'recorded': SplitInfo(name='train', num_bytes=483687, num_examples=44, shard_lengths=None, dataset_name='chat_haruhi-role_playing')}]
溯源:
/content/Haruhi-2-Dev/ChatHaruhi/ChatHaruhi.py in init(self, system_prompt, role_name, role_from_hf, role_from_jsonl, story_db, story_text_folder, llm, embedding, max_len_story, max_len_history, verbose, db_type)
156
157 fname = split_name + '.jsonl'
--> 158 dataset = load_dataset(dataset_name,data_files={'train':fname})
159 datas = dataset["train"]
The text was updated successfully, but these errors were encountered: