Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_data_parallel.py hangs with pool.apply_async(), but not when called in the main thread. #1

Closed
djl11 opened this issue Apr 10, 2021 · 6 comments

Comments

@djl11
Copy link

djl11 commented Apr 10, 2021

By naively following the installation instructions, I'm finding that this line hangs when running with multiprocessing, but not when running in the main thread.

Would it be possible to list the full conda environment you used, by running conda list -e > requirements.txt?
For example, I would be interested to know which versions of python, Open3D and multiprocessing you used.

Any help appreciated :)

@Steve-Tod
Copy link
Collaborator

Hi, I add the conda environment description here. BTW, I suggest debugging the data generation process with --num-proc 1 argument to set it to single process.

@djl11
Copy link
Author

djl11 commented Apr 12, 2021

Thanks for sending over! Unfortunately using your conda environment didn't help, and setting --num-proc 1 still causes it to hang. The only way I can get it to run is remove mp.Pool() entirely and call main() explicitly. I'll leave this issue open and further investigate, and then close with the solution once I work out what's going on.

@Steve-Tod
Copy link
Collaborator

Sorry about that. What script are you running? generate_data_parallel.py or construct_dataset_parallel.py?

@djl11
Copy link
Author

djl11 commented Apr 12, 2021

generate_data_parallel.py. Also, sorry setting to --num-proc to 1 does indeed work for debugging. I mistakenly thought I had added the if args.num_proc > 1 check myself.

still might just be something silly on my end, will post my findings either way!

@djl11
Copy link
Author

djl11 commented Apr 12, 2021

Multiprocessing issues with Open3D seem relatively common: a, b, c.

I took a look at how Open3D apply parallelism for batch-processing in their own examples, and they use joblib.

Replacing lines 195-202 with:

from joblib import Parallel, delayed
Parallel(n_jobs=args.num_proc)(delayed(main)(args, i) for i in range(args.num_proc))

Fixed the problem for me. Hopefully this helps others who may get stuck with similar issues.

@djl11 djl11 closed this as completed Apr 12, 2021
@Andyyoung0507
Copy link

Hi, I also try python scripts/generate_data_parallel.py --scene pile --object-set pile/train --num-grasps 4000000 --num-proc 40 --save-scene ./data/pile/data_pile_train_random_raw_4M to generate the data, but the terminal show nothing but pybullet build time:……, I also set --num-proc to 1 to debug, but it didn't work. Then I use vscode to debug, find that is maybe caused by multiprocess. I tried @djl11 's solution, but nothing changed. So i reinstall the conda env, The script runs, but gets stuck somewhere after generating some data. May I have your help please?
Thanks for help. @djl11 @Steve-Tod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants