Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new parameter --setup to preload all DBs and ref files, closes #102 #161

Merged
merged 32 commits into from
Feb 1, 2022

Conversation

fischer-hub
Copy link
Collaborator

@fischer-hub fischer-hub commented Jan 3, 2022

Running rnaflow with --setup will now start the setup sub-workflow and only download the necessary DBs and reference files to run the pipeline.
I'm working on additionally pre loading all the docker/singularity images /conda envs but at least for singularity its not yet working because of some parallel pulling issues, lets hope theres a solution for that.

Just got the singularity/docker image pull to work!

@fischer-hub fischer-hub linked an issue Jan 3, 2022 that may be closed by this pull request
@fischer-hub fischer-hub added the enhancement New feature or request label Jan 3, 2022
@hoelzer hoelzer changed the base branch from read_auto_detection to master January 29, 2022 13:04
@hoelzer
Copy link
Contributor

hoelzer commented Jan 29, 2022

Great!

Btw, how did you solve the docker/singularity pull issue @fischer-hub ? I remember, that we often recommend to run a pipeline initially w/ only 1 CPU so that especially singularity images are not pulled in parallel which apparently can cause trouble.

Can you please resolve the conflicts w/ master? I re-branched this to directly merge into master

@fischer-hub
Copy link
Collaborator Author

Btw, how did you solve the docker/singularity pull issue @fischer-hub ? I remember, that we often recommend to run a pipeline initially w/ only 1 CPU so that especially singularity images are not pulled in parallel which apparently can cause trouble.

The containerGet process just pulls each image from the container.config file, one at a time manually and --setup sets the max_cores to 1 so containerGet will not be executed in parallel ever. nf-core/rnaseq has something similar but they download the images with some external bash script without the singularity pull command. While that works too it probably uses more time and storage since you can't make use of identical image layers but download the whole image each time.

Conflict is resolved now !

withLabel: dammit { container = 'nanozoo/dammit:1.2--b47259e' }
withLabel: dammitDB { container = 'nanozoo/dammit:1.2--b47259e' }
withLabel: basic_tools { container = 'nanozoo/python_rnaseq:3.8--7a7808c' }
withLabel: rattle { container = 'huxleys/rattle:1.0.0--24021329c8b365f21959c56ee1cfb693c768c14e' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm @fischer-hub , will this remove rattle now from the master branch? I think we don't want this :) Sorry if the re-base to master causes confusion now

@hoelzer
Copy link
Contributor

hoelzer commented Jan 29, 2022

smart idea w/ the containerGet process!

I tried

nextflow run main.nf --setup -profile local,singularity 

Which fails with:

You need to set a genome for mapping and an annotation for counting: with --autodownload [hsa, eco, mmu, mau] are provided and automatically downloaded; with --genome and --annotation set csv files for custom input.

I think this is intended... bc/ the workflow also needs to know which species is targeted. However, what if I don't know yet and just want to install all singularity images (or condas, ...)?

But then I tried

nextflow run main.nf --setup -profile local,singularity --autodownload eco

and got:

Running in setup mode. Only necessary database and reference files will be downloaded.
 (No such file or directory)

 -- Check script 'main.nf' at line: 163 or see '.nextflow.log' file for more details

So three things:

  1. why does the last command not work? I would think this is the way you implemented --setup currently?
  2. do we also want to allow something like my first command? So the pipeline can be configured even w/o providing any --autodownload or --genome/annotation. For me, this would make sense bc/ then I can install everything and then even start offline via providing my own genome/gtf files, right?
  3. Adding short information about this --setup option to the README.md would be good

@fischer-hub
Copy link
Collaborator Author

 -- Check script 'main.nf' at line: 163 or see '.nextflow.log' file for more details

Ah yes that must have happened when I fixed the read mode and strandedness log info :/ Should be a quick fix tho..

So three things:

1. why does the last command not work? I would think this is the way you implemented `--setup` currently?

2. do we also want to allow something like my first command? So the pipeline can be configured even w/o providing any `--autodownload` or `--genome/annotation`. For me, this would make sense bc/ then I can install everything and then even start offline via providing my own genome/gtf files, right?

Yes thats actually how I thought of the setup flag too :D must have missed the --autodownload check !

3. Adding short information about this `--setup` option to the README.md would be good

Yep good idea

@hoelzer
Copy link
Contributor

hoelzer commented Jan 29, 2022

alright, thx, let me know here when I should give it anther test!

@fischer-hub
Copy link
Collaborator Author

alright, thx, let me know here when I should give it anther test!

Works for me now :) !

nextflow main.nf --setup -profile singularity,slurm

Also I think I left the autodownload check there because the referenceGet process depends on that information to get the species specific reference files. But then the user can now decide to download these now or later.

@hoelzer
Copy link
Contributor

hoelzer commented Jan 29, 2022

nextflow main.nf --setup -profile singularity,slurm

Nice, yes

nextflow main.nf --setup -profile singularity,local 

works now, but after all singularity images were pulled I tried:

nextflow main.nf --setup -profile singularity,local,test 

and again some images were pulled, see for example:

nanozoo-fastp-0.23.1--9f2e255.img
nanozoo-fastp:0.23.1--9f2e255.img
nanozoo-python_rnaseq-3.8--7a7808c.img
nanozoo-python_rnaseq:3.8--7a7808c.img

The ones with the : are pulled by --setup while actually during the run nf seems to look for the images with - in the middle :)

@fischer-hub
Copy link
Collaborator Author

Just replaced the ':' with '-' now in the image file name:

nextflow main.nf -profile singularity,slurm,test

runns through for me with the pulled containers from --setup now :)

@hoelzer
Copy link
Contributor

hoelzer commented Jan 31, 2022

Works also on my end, great @fischer-hub !

Did you also add something about the --setup option to the README? If so, please feel free to merge this PR!

@fischer-hub fischer-hub merged commit 6dae385 into master Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--setup/offline preperation...
2 participants