You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following on from a discussion on Slack with Harshil Patel, I wanted to raise an issue regarding the genome size needed for the --macs_gsize option when running the nf-core/chipseq pipeline (and others).
My results for human38 (assuming k-mers of 100-bp) were similar to that reported in iGenomes: 2.8e9 vs 2.7e9 respectively.
However, the calculations for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (iGenomes).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)
I believe the relevant nf-core documentation should be updated to show these new values.
(It may be of interest to you to know that I have been putting together a script that automates the reference genome downloading. I shall incorporate the DeepTools kmer estimation of genome size into the automated download process. I shall share this with you when it is ready, incase it is of any use).
Many thanks,
Steven
The text was updated successfully, but these errors were encountered:
The Python3 script 'download_genomes.py' downloads the relevant files. It takes as input the 'genomes_to_download.csv' file. This file lists the genomes to download (as referenced in Ensembl). The CSV file also lists the processing that should be done once the genomes have been downloaded e.g. create Bowtie2 index files.
I shall add code so DeepTools processing may be performed on the downloaded genome to ascertain the mappable genome size.
We updated the macs_gsize in the igenomes config providing now a value for several read_lengths and also adding some logic to calculate it if the genome is not present, see #283. You can give it a try using the dev branch, thus, I will close the issue now.
Following on from a discussion on Slack with Harshil Patel, I wanted to raise an issue regarding the genome size needed for the --macs_gsize option when running the nf-core/chipseq pipeline (and others).
I shall be using genomes not available in iGenomes and so needed to calculate this value myself. To check I could do this correctly, I tried to get the same values for human and mouse as reported at: https://github.com/nf-core/chipseq/blob/master/conf/igenomes.config
To perform the calculation, I ran the script unique-kmers.py as described at:
https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
My results for human38 (assuming k-mers of 100-bp) were similar to that reported in iGenomes: 2.8e9 vs 2.7e9 respectively.
However, the calculations for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (iGenomes).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)
I spoke to the MACS developers and they appear to agree that these values need updating:
macs3-project/MACS#508 (comment)
I believe the relevant nf-core documentation should be updated to show these new values.
(It may be of interest to you to know that I have been putting together a script that automates the reference genome downloading. I shall incorporate the DeepTools kmer estimation of genome size into the automated download process. I shall share this with you when it is ready, incase it is of any use).
Many thanks,
Steven
The text was updated successfully, but these errors were encountered: