Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added cluster config files and comments to main.nf #79

Closed
wants to merge 1 commit into from

Conversation

DoaneAS
Copy link

@DoaneAS DoaneAS commented Dec 13, 2019

Notes

  • I made changes to main.nf in a private branch to deal with issues below, as my solutions may break on other systems. I added comments to main.nf in this PR, but otherwise this PR is only adding the cluster config files, conf/panda.config and conf/nygc.config. See comments below for changes I made on my private branch.

Comments and suggestions for revision

  • Some processes that use pipes are using more than the requested resources. See comments in main.nf for process MapReads. Commands that are piped together will run concurrently, and the resources used will be the sum of the resources for each piped command. So bwa mem -t 2 ... | samtools sort -@2 ... will use 4 cpus, not 2. The same applies for memory. I made a private branch with specific cpu and memory values as needed for my use case (WGS), but you likely need a more generalizable solution.
  • Feature request: option to set temp directory path or environment variable. Several processes make use of a temp directory using /tmp. The compute nodes at Weill Cornell and New York Genome Center (NYGC) have very limited space in /tmp. For temp directory, we use the env variable $TMPDIR, which specifies a local scratch disc.
  • BaseRecalibrator: reduce memory resource request. The bam files are spllit up for parallel processing, so using max memory was way overkill on my system and caused jobs to delay in queue. Something like 10G was more than enough for WGS data with ~1 billion reads/sample. In a private branch, I changed it to "memory_singleCPU_2_task" and this worked well for WGS data.
  • Exit code 140 for Univa grid engine could be added to base.config to resubmit job.
  • Issue: some ApplyBQSR jobs are failing with exit code 141. Cause: In base.config, setting pipefail option causes jobs to sometimes fail with exit code 141. Known issue with pipefail and exit 141. See https://gitter.im/nextflow-io/nextflow/archives/2016/04/12
    • Solution I found was changing process.shell = ['/bin/bash', '-euo', 'pipefail'] to process.shell = ['/bin/bash', '-eu']. This change is not in PR.

Many thanks to the nf-core team for nf-core/sarek!

@DoaneAS DoaneAS requested a review from maxulysse as a code owner December 13, 2019 00:00
@maxulysse
Copy link
Member

Hi @DoaneAS that's an very good PR, but we don't have any Institutional configs here in the sarek repository.
You should include these configs in the nf-core/configs repository.
Here is the documentation to help you with an institutional profile: https://github.com/nf-core/configs/#adding-a-new-config
And here is the documentation to help you with a pipeline-specific institutional profile: https://github.com/nf-core/configs/#adding-a-new-pipeline-specific-config

Don't hesitate to ping me or ask me on slack if you have any question.

@maxulysse
Copy link
Member

Regarding your other comments and suggestions:
-MapReads definitively needs some improvement in resource requirements, and we are currently exploring some options (#57, #59).

  • Memory requirements for BaseRecalibrator are already fixed (Remove label memory_max from BaseRecalibrator process #73) and we noticed a gain of around 30 hours... I must have kept the previous memory requirement when parallelizing it.
  • For the temp directory path, I think you can already set that up by specifying where the work directory is. But if you ave more suggestion, I'll welcome any.
  • Exit code 140 could be added either in your institutional config.
  • process.shell could also be changed in your institutional config.

Regarding the last two comments, I'll notify @nf-core/core team as we could move these generic setting from the pipeline repo to the nf-core/configs repo

@apeltzer
Copy link
Member

Yup, agree that this here:

Regarding the last two comments, I'll notify @nf-core/core team as we could move these generic setting from the pipeline repo to the nf-core/configs repo

should be in the central configs ideally.

@maxulysse maxulysse closed this Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants