Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make conda environment.yml #2

Closed
7 tasks done
ewels opened this issue Apr 24, 2018 · 12 comments
Closed
7 tasks done

Make conda environment.yml #2

ewels opened this issue Apr 24, 2018 · 12 comments
Labels
feature-request Request to add new functionality

Comments

@ewels
Copy link
Member

ewels commented Apr 24, 2018

It would be great if we could manage the pipeline software requirements with a single conda environment.yml file. Then we can build the Docker and Singularity containers from this.

I have started a branch to work on this together: https://github.com/nf-core/ChIPseq/tree/bioconda

Note that not all packages are available on conda yet, so we may need to package some ourselves.

  • Find all packages in conda
  • Complete environment.yml file and test it installs
  • Check whether the Docker image builds properly
  • Check whether the pipeline still works (almost certainly not)
    • Adjust pipeline commands / scripts as necessary
  • Strip out all previous custom R installation stuff / environment module stuff
  • Think more about whether we want to be installing reference genomes into the containers
    • Hint: I would prefer not to. So we need to look into how we can use regular references (eg. iGenomes) and adjust scripts accordingly.
@ewels ewels added the feature-request Request to add new functionality label Apr 24, 2018
@apeltzer
Copy link
Member

apeltzer commented May 16, 2018

I figured out 99% of all packages (I hope at least) in bioconda or conda-forge (see the bioconda branch which I committed to).

The missing one is "phantompeakqualtools", but if I'm not mistaken (please check this out @ewels ), this is also now packaged in a CRAN R package as "spp", therefore available in bioconda:

https://bioconda.github.io/recipes/r-spp/README.html

If this is the case, we just need to add:

  - bioconda::r-spp=1.15.2

and could be done with the new environment.yaml.

@apeltzer
Copy link
Member

See also here for the last missing package : http://compbio.med.harvard.edu/Supplements/ChIP-seq/

@ewels
Copy link
Member Author

ewels commented May 17, 2018

Super cool! Yes, phantompeakqualtools is part of spp, so just installing that should do the job. Great work @apeltzer - thanks! That's quite a relief that we don't need to package stuff ourselves.

There are a few other issues that need testing / thinking about now... I've updated the original comment with some checkboxes.

@apeltzer
Copy link
Member

apeltzer commented May 17, 2018

Only very small tools missing:

  • run_spp.R
  • ngs.plot.R

These are highlighted in the bioconda branch. The docker image builds fine, but I don't know where to get the run_spp.R script from, as the bioconda package doesn't install that unfortunately.

/opt/conda/envs/nfcore-chipseq-1.4dev/lib/R/library/spp/

@ewels
Copy link
Member Author

ewels commented May 17, 2018

run_spp.R is from phantompeakqualtools. We were downloading this from https://code.google.com/archive/p/phantompeakqualtools/ but it looks like the latest version of the code is now at https://github.com/kundajelab/phantompeakqualtools

So me saying that "phantompeakqualtools is part of spp" was probably wrong sorry.

ngs.plot.R is found at https://github.com/shenlab-sinai/ngsplot

@ewels
Copy link
Member Author

ewels commented May 30, 2018

Installation script for these tools by @tiagochst in #10:

ENV NGSPLOT_VERSION="2.63"
RUN curl -fsSL https://github.com/shenlab-sinai/ngsplot/archive/${NGSPLOT_VERSION}.tar.gz -o /opt/ngsplot_${NGSPLOT_VERSION}.tar.gz && \
    tar xvzf /opt/ngsplot_${NGSPLOT_VERSION}.tar.gz -C /opt/ && \
    rm /opt/ngsplot_${NGSPLOT_VERSION}.tar.gz
ENV PATH=${PATH}:/opt/ngsplot-${NGSPLOT_VERSION}/bin
ENV NGSPLOT=/opt/ngsplot-${NGSPLOT_VERSION}/

RUN wget "https://drive.google.com/uc?export=download&id=0B5hDZ2BucCI6SURYWW5XdUxnbW8" -O ngsplotdb_hg19_75_3.00.tar.gz && \
    echo y | ngsplotdb.py install ngsplotdb_hg19_75_3.00.tar.gz && \
    rm -rf  ngsplotdb_hg19_75_3.00.tar.gz && \
    wget "https://drive.google.com/uc?export=download&id=0B5hDZ2BucCI6S3E4dVprdlF2YW8" -O ngsplotdb_hg38_76_3.00.tar.gz && \
    echo y | ngsplotdb.py install ngsplotdb_hg38_76_3.00.tar.gz && \
    rm -rf  ngsplotdb_hg38_76_3.00.tar.gz && \
    wget "https://drive.google.com/uc?export=download&id=0B5hDZ2BucCI6NXNzNjZveXdadU0" -O ngsplotdb_mm10_75_3.00.tar.gz && \
    echo y | ngsplotdb.py install ngsplotdb_mm10_75_3.00.tar.gz && \
    rm -rf  ngsplotdb_mm10_75_3.00.tar.gz 

RUN git clone https://github.com/kundajelab/phantompeakqualtools  && \
    mv phantompeakqualtools /opt/  && \
    echo 'library(caTools)' | cat - /opt/phantompeakqualtools/run_spp.R > temp && mv temp /opt/phantompeakqualtools/run_spp.R && \
    chmod 755 /opt/phantompeakqualtools/* && \
    echo 'alias run_spp.R="Rscript /opt/phantompeakqualtools/run_spp.R"' >> ~/.bashrc 
ENV PATH=${PATH}:/opt/phantompeakqualtools 

Need to think hard about how to handle the reference genomes for ngsplot.

@ewels
Copy link
Member Author

ewels commented May 30, 2018

The three ngsplot references are actually not super huge:

35M ngsplotdb_hg19_75_3.00.tar.gz
34M ngsplotdb_hg38_76_3.00.tar.gz
21M ngsplotdb_mm10_75_3.00.tar.gz

That's 91M in total, a lot less than I was expecting. This is small enough that it may be possible to add these to the bioconda recipe (in discussion with that community). Alternatively, we could consider adding these to the pipeline repo. Finally, it would be very easy to add these to the AWS-iGenomes reference.

@ewels
Copy link
Member Author

ewels commented May 30, 2018

xref #10 (comment):

We just need to keep in mind that the run_spp.R has a line missing (library(caTools)) to make it work.

This is something that we can add in using a patch in the bioconda release.

@apeltzer - is this packaging something that you're keen / able to work on?

@apeltzer
Copy link
Member

apeltzer commented May 30, 2018

Hi everyone! I just asked in the bioconda gitter whether they feel its alright and will do it once confirmed that its okay to package something like this!

Absolutely happy to create a bioconda package for that specific purpose - shouldn't be too difficult to do that!

@apeltzer
Copy link
Member

So all conda packages are there and I added run_spp.R to the bin folder of the bioconda branch.
Will adjust calls to tools now!

@apeltzer
Copy link
Member

Just checked and everything should work now (at least from what I see in main.nf @ewels I will send in a PR of bioconda branch to master and we'll see what the tests say?

@ewels
Copy link
Member Author

ewels commented Jun 12, 2018

Sounds good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request to add new functionality
Projects
None yet
Development

No branches or pull requests

2 participants