-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hanging in rule pamir_assemble_full_new: #49
Comments
@christinafliege from just looking at this log you were running this job with only 4 threads. Did you run with -j16 or machine has 4 cores at most? |
we had some problems with running out of mem, so I was r unning on -j4; but its runnign with -j16 for the past ten hours.
|
Good Afternoon. While running with -j16 the program has been running for 76 hours on rule pamir assembly full new. Do you ahve any advice for if this should still be running on this step or if it is hung somewhere? Thanks!
|
We have run pamir on 26 populations of 1000G and per sample this step takes a few hours on 64core machine. What type of genome are these?
Also
|
Thank you for getting back to me! It has currently been running for ~101 hours on the same step. Here is the .stat and the partition.count Thanks again
|
Dear Christina, Below I share one of the 1000g invidivuals. As you can see your sample has 2x more reads compared to 1000g. Also, if you take a look at discordant number you have almost 4x more reads. So with this increased coverage and less compute power it might take longer then in our case. One interesting thing is TLEN stats. First thing is 26 bp fragment length is very short. To double check if everything is correct could you try the script below. Also minimum range was reported as 40 but the mean is 26. Lets double check it. Also which aligner did you use to map data? if so could you share the mapping command with the aligner version ? Template Length Gathering code
NA19239
Thank you very much, T. |
When running your script I get a million lines which starts like so. Would you be able to help interpret this?
|
4 lanes were aligned and then merged into a single BAM. Here is the command for how it was done. Thanks!
|
Hi Christina, The endless tlen code was my mistake. My apologies. Our aim was to double check the fragment size stats, i accidentally allowed other reads than properly paired ones that caused the million lines of TLENs.
I dont see a problem with the bwa mem side. So we can cross that step out. |
Thank you! However, I have run the new code and get a similar but slightly different result. Again for more than a million lines.
|
Good Morning, Additionally, the file sizes in /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/007-pamir-assembly/004-HLH-004_all_lanes_merged are not changing and have had the same sizes for many hours. Thank you for your assistance.
|
@christinafliege, I would be more than happy to set up a zoom meeting to go through this faster if you like. Just email me and I send you a zoom link with @f0t1h and @mortunco. You can find my email in the paper. Meanwhile, @mortunco and @f0t1h will continue to debug with sending you commands here. |
Dear @christinafliege, Thank you for sharing the TLEN distribution. The command that i shared gets the fragment length information from your bam file for only properly paired reads Its really interesting that we are seeing numbers like 248,936,211 in the fragment length distribution. I think this is causing problems throughout the pipeline. Our first guess was some tab ("\t") character in your chromosome name cause shifting all the columns in bam file. Could you share the following output files with us ?
Also, could you check what was your latest cluster id processed.
it should return something like this
Thank you for your input and patience, Best, |
The bam-chr-names is about 3000 lines long and starts as so:
The reference cat is also about 3000 lines and starts as so.
|
@christinafliege is this step completed? |
We let it run a bit more, but then based on our needs and resources we killed it and changed the hardcoded 7000 to a 2000, as well as providing a bed file of centromeres and started it back up again. I will update you as this continues, thank you very much for your help! |
Good Afternoon, We after restarting it on on Tuesday using the hardcoded change and a provided centromere file Pamir ran on our data all weekend. However, I used the commands you provided to check the partition count as well as the current cluster ID that it was currently working on. This showed that it has been chugging along on the same cluster ID since Wednesday, which was the same one that it was working on when we had our call last week. Cluster ID : 2679000, as shown above. Could you help us to move past this blocker? Additionally, do you think that changing the hardcoded top of the read interval from 2000 down additionally to 1000 would make any difference? Thank you! |
Hi Christina, If you mean this setting, About that issue i need a little bit help from you. In pamir assembly step, each previously determined partition processed. There is obviously some wonky stuff going on that partition 2679000 so lets see how the reads look like in that partition. partition log to see brief summary of reads. problematic partition and previous partitions log should be there. Now lets get the reads. The directory where you installed pamir directory like below. In there there is a pamir executable. please run the following command there to extract reads related with this cluster. This command will give you last 6 clusters (including problematic one) + the first cluster in the partition. If we can get those reads, we can replicate the problem on our side.
I dont know how is your pamir github reposityory was set. So pamir executable in pamir directory has a function to extract reads for spesific partition. following command:
once you get them could you share the files from Here is the link to share files. Best, |
Dear @christinafliege, Were you able to fix the error or generate the pamir partition files? Let us know if we can do anything to fix this issue. T. |
Sorry for the slow response, many projects are going right now. From your previous post "If you mean this setting, cfg_default("pamir_partition_per_thread",1000), the default is originally 1000." What we did was change linle 245 as follows if ( p.size() > 7000 || p.size() <= 2 ) { to if ( p.size() > 2000 || p.size() <= 2 ) { I will work to get the parition files this afternoon. |
The previous job I have been running timed out at, rule pamir_assemble_full_new. I restarted it for 72 hours without deleting any intermediate files. It appeared to start back up again, but ran for 72 hours before timing out. Before running again with more time I would like to verify that I did not have to delete any intermediate files, or check if anythign else is going wrong here. In your paper it looks like for a single chromosome pamir is quite quick, however this is three samples and the full human reference. Would you be able to advise with optimizing cores and time? Thank you!
Here is from the error file and the contents of the output directory
The text was updated successfully, but these errors were encountered: