Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing dask with srun and concurrent. #11

Merged
merged 8 commits into from
Mar 5, 2021
Merged

Replacing dask with srun and concurrent. #11

merged 8 commits into from
Mar 5, 2021

Conversation

vianamp
Copy link
Collaborator

@vianamp vianamp commented Mar 5, 2021

Because the combination of Dask and Slurm was very unstable, we replaced the parallel computation done by Dask with srun and concurrent. A dataframe is split into multiple chunks and each chunk is ran by a job on slurm. Each job runs multiple process in parallel using ProcessPoolExecutor from concurrent.

@vianamp
Copy link
Collaborator Author

vianamp commented Mar 5, 2021

We used this version to compute the current set of features for the whole variance dataset. It takes about 3hrs.

@vianamp vianamp merged commit 37afd94 into master Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant