You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromtransformersimportTrainingArgumentsimporttorch# get the number of gpusnum_gpus=torch.cuda.device_count()
ifnum_gpus>1:
fromparallelformersimportparallelizeparallelize(model, num_gpus=num_gpus, fp16=True, verbose="detail")
gives
RuntimeError: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00) WARNING No nodes ran. Repeat the previous runner.py:213 command to attempt a new run. [10/15/23 12:57:26] ERROR Node 'sort_using_baal: node.py:356 preprocess_and_sort([baal.reed_textkernel_labeled,params:reed.pretrained_model_name,reed.aimwel_labeled.finetuned_pre_trained_isco_classifier]) -> [reed.textkernel_labeled.sorted_jobs,baal.reed_textkernel_labeled_parquet]' failed with error: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00)
Environment
python 3.10.1
parralelformers latest
o: ubuntu
The text was updated successfully, but these errors were encountered:
gives
RuntimeError: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00) WARNING No nodes ran. Repeat the previous runner.py:213 command to attempt a new run. [10/15/23 12:57:26] ERROR Node 'sort_using_baal: node.py:356 preprocess_and_sort([baal.reed_textkernel_labeled,params:reed.pretrained_model_name,reed.aimwel_labeled.finetuned_pre_trained_isco_classifier]) -> [reed.textkernel_labeled.sorted_jobs,baal.reed_textkernel_labeled_parquet]' failed with error: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00)
Environment
python 3.10.1
parralelformers latest
o: ubuntu
The text was updated successfully, but these errors were encountered: