-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include gene annotation versions in multiqc report #75
Comments
Hi @ChristopherBarrington ! Just put this over to the nf-core/atacseq repository as this is pipeline specific - thanks for the suggestion, which makes sense I believe. |
@apeltzer Ok, I thought that the section of the multiqc report was nf-core/tools so included it there - apologies. Thanks for looking into it. |
No worries, it's something to consider for ATACseq as you experienced it there but with the option to also do this in other pipelines using iGenomes so we might open another issue then once evaluating this over here in the nf-core/tools template to allow for such a thing to be reported in the multiqc report 👍 |
@apeltzer I asked @ChristopherBarrington to add the issue to @ChristopherBarrington the versions we are using at the Crick could be different because we are still using an old version of Illumina iGenomes which is why I want to update this ASAP. Is it ok to move this back to |
Ok with me - I already wondered whether its specific to ATACseq enough to only discuss it there - please move it back then 👍 |
Is this information in the GTF files themselves anywhere, or only in the |
Its only in the |
Honestly, I can't see this happening in nf-core. It sounds highly specific to AWS-iGenomes which I don't like very much and I can imagine all kinds of nastiness with varying filesystems. I have alarm bells ringing for a potential pit of despair project here 🚨 I appreciate the motivation though.. 🤔 |
I think this might be quite easy to add in. We could just have a separate 'GRCh37' {
fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
mito_name = "MT"
macs_gsize = "2.7e9"
blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed"
readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
} And some logic to evaluate that parameter in the pipeline: This should then only work if using |
Ah and just copying in that one file? Ok then yes that is doable. I had envisaged trying to parse out specific strings within a global iGenomes readme file or other horrible stuff. Nice! 👍 Maybe |
Yep 👍 @ChristopherBarrington do you fancy making a pull request to |
OK @drpatelh I'll give it a go! |
Ok. This might cause problems 😕 The So we could update the |
Im going to close this for now but feel free to re-open if you find another path of least resistance. |
Can’t you just try to include it and use a |
Maybe this would work:
and then in the process:
i.e. using a |
So if you would like to obtain the version specified in the You can then obtain a listing of all the files hosted on AWS iGenomes from this file: Find the |
Surely Nextflow can handle that for you? It already has built-in support for staging S3 files... |
And I thought that the idea was to include the readme path in the iGenomes config for each species? That would be better as many people will have downloaded this offline and should also have the readmes there.. |
Yep. Would be good for Nextflow to do this for us but a number of things would need to happen before this will work properly. As I mentioned in this comment we would need to update the AWS syncing script to add this file and re-sync for everyone using offline iGenomes otherwise a |
Could we just not |
Absolutely! But it feels a bit hacky 😓 Ive added the The paths will now be rolled out to all pipelines via the automated synchronisation when we release |
This was implemented in #77 whereby |
After using the atacseq pipeline, I checked the multiqc report and find relevant information recorded such as reference genome. The paths to bed/gtf files are included but a useful piece of information to include would be the annotation version used. The pipeline uses iGenomes so I checked the README in the Annotation subdirectory and found the included files were release-81 (2015).
For downstream analysis, this information would be nice to include in the multiqc report if possible.
Thanks,
Chris
The text was updated successfully, but these errors were encountered: