-
-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocess with debug flag fails. #1544
Labels
bug
Something isn't working
Comments
+1 |
Update: downgrading datasets to |
+1 |
See my change in #1548, you can wrap |
It worked for me as well! |
qiyuangong
added a commit
to qiyuangong/ipex-llm
that referenced
this issue
Apr 23, 2024
7 tasks
qiyuangong
added a commit
to intel-analytics/ipex-llm
that referenced
this issue
Apr 23, 2024
* Downgrade datasets to 2.15.0 to address axolotl prepare issue axolotl-ai-cloud/axolotl#1544 Tks to @kwaa for providing the solution in #10821 (comment)
Work for me. :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please check that this issue hasn't been reported before.
Expected Behavior
Preprocess with debug flag should work.
python -m axolotl.cli.preprocess /content/test_axolotl.yaml --debug
Current behaviour
Gives error. Have json file with each example in the json file is with {"text": <text_str>}.
I am doing Pretraining with Lora for Non-Eng lang.
[2024-04-19 09:05:02,918] [DEBUG] [axolotl.log:61] [PID:2346] [RANK:0] max_input_len: 600
Dropping Long Sequences (num_proc=2): 100% 17/17 [00:00<00:00, 99.19 examples/s]
Add position_id column (Sample Packing) (num_proc=2): 100% 17/17 [00:00<00:00, 70.88 examples/s]
[2024-04-19 09:05:03,502] [INFO] [axolotl.load_tokenized_prepared_datasets:423] [PID:2346] [RANK:0] Saving merged prepared dataset to disk... /content/d538aae6e42c7df428d20d3ff2685ad0
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 70, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 60, in do_cli
load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/content/src/axolotl/src/axolotl/cli/init.py", line 397, in load_datasets
train_dataset, eval_dataset, total_num_steps, prompters = prepare_dataset(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 66, in prepare_dataset
train_dataset, eval_dataset, prompters = load_prepare_datasets(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 460, in load_prepare_datasets
dataset, prompters = load_tokenized_prepared_datasets(
File "/content/src/axolotl/src/axolotl/utils/data/sft.py", line 424, in load_tokenized_prepared_datasets
dataset.save_to_disk(prepared_ds_path)
File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 1515, in save_to_disk
fs, _ = url_to_fs(dataset_path, **(storage_options or {}))
File "/usr/local/lib/python3.10/dist-packages/fsspec/core.py", line 363, in url_to_fs
chain = _un_chain(url, kwargs)
File "/usr/local/lib/python3.10/dist-packages/fsspec/core.py", line 316, in _un_chain
if "::" in path
TypeError: argument of type 'PosixPath' is not iterable
Steps to reproduce
Use json with with each example in the json file is with {"text": <text_str>}.
Preprocess with debug flag.
python -m axolotl.cli.preprocess /content/test_axolotl.yaml --debug
But i get the error.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
Latest
Acknowledgements
The text was updated successfully, but these errors were encountered: