-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not getting reproducible results. #13
Comments
Update: I realized that I need to sort all the 358 inputs and sort them. My result is closer to the one posted but it still does not match perfectly. |
The smaller genomes use a different overlap parameter. I recommend using 5kb instead of 20kb like was done for human. You can change this in the code here: https://github.com/Boyle-Lab/Blacklist/blob/master/blacklist.cpp#L469 |
|
I changed: "if(miss < (binSize/binOverlap + 200)) { // bridge over adjacent bins plus 100 * 200 = 20kb" to I get closer to dm3-blacklist.v2.bed.gz, but it still is slightly different. Here is what the code now reports: The current code masks 3,284,300 bp compared to 2,689,400 bp that it is supposed to output. |
I did notice that one of the mapped input files, ENCFF620WUR.bam, used to generate the blacklist is empty when I try to download it. Could this be causing the descrepancy? |
Have you been able to work out the discrepancy? Perhaps we can compare md5 hashs of the bam files to make sure that we are working with the same data? |
Below are the inputs I used. I sort and index them using samtools and then run the blacklist tool.
|
Good afternoon,
I downloaded the 358 input bam files for Drosophila (dm3). Next I downloaded the respective umap files. Then I ran the blacklisting tool. It did not give me the same results as found in dm3-blacklist.v2.bed.gz. Is there something I am missing?
When I run the demo, I get the expected results.
I hope to hear from you soon.
Thanks,
Sammy
The text was updated successfully, but these errors were encountered: