Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging in rule pamir_assemble_full_new: #49

Open
christinafliege opened this issue Jun 18, 2020 · 20 comments
Open

Hanging in rule pamir_assemble_full_new: #49

christinafliege opened this issue Jun 18, 2020 · 20 comments

Comments

@christinafliege
Copy link

The previous job I have been running timed out at, rule pamir_assemble_full_new. I restarted it for 72 hours without deleting any intermediate files. It appeared to start back up again, but ran for 72 hours before timing out. Before running again with more time I would like to verify that I did not have to delete any intermediate files, or check if anythign else is going wrong here. In your paper it looks like for a single chromosome pamir is quite quick, however this is three samples and the full human reference. Would you be able to advise with optimizing cores and time? Thank you!

Here is from the error file and the contents of the output directory

rule pamir_assemble_full_new:
    input: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-001_all_lanes_merged/partition, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-001_all_lanes_merged/partition.count, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/001-pamir-remove-concordants/003-HLH-001_all_lanes_merged/003-HLH-001_all_lanes_merged.stats.json
    output: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-001_all_lanes_merged/all.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-001_all_lanes_merged/all_LOW_QUAL.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-001_all_lanes_merged/all.log
    jobid: 54
    wildcards: sample=003-HLH-001_all_lanes_merged
    threads: 4


-rw-r----- 1 cfliege2 bany  86M Jun 15 17:44 T0.log
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T0.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T0_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T0_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T1.log
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T1.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T1_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T1_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 15 17:40 T2.log
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T2.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T2_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 15 17:40 T2_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  13G Jun 18 13:39 all.log
-rw-r----- 1 cfliege2 bany 7.9M Jun 18 13:39 all.vcf
-rw-r----- 1 cfliege2 bany  74M Jun 18 13:39 all_LOW_QUAL.vcf
@fhach
Copy link
Collaborator

fhach commented Jun 19, 2020

@christinafliege from just looking at this log you were running this job with only 4 threads. Did you run with -j16 or machine has 4 cores at most?

@christinafliege
Copy link
Author

we had some problems with running out of mem, so I was r unning on -j4; but its runnign with -j16 for the past ten hours.

rule pamir_assemble_full_new:
    input: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition.count, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/001-pamir-remove-concordants/003-HLH-003_all_lanes_merged/003-HLH-003_all_lanes_merged.stats.json
    output: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all_LOW_QUAL.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all.log
    jobid: 55
    wildcards: sample=003-HLH-003_all_lanes_merged
    threads: 16

@christinafliege
Copy link
Author

Good Afternoon.

While running with -j16 the program has been running for 76 hours on rule pamir assembly full new. Do you ahve any advice for if this should still be running on this step or if it is hung somewhere? Thanks!

[Fri Jun 19 08:01:57 2020]
rule pamir_assemble_full_new:
    input: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition.count, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/001-pamir-remove-concordants/003-HLH-003_all_lanes_merged/003-HLH-003_all_lanes_merged.stats.json
    output: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all_LOW_QUAL.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/007-pamir-assembly/003-HLH-003_all_lanes_merged/all.log
    jobid: 55
    wildcards: sample=003-HLH-003_all_lanes_merged
    threads: 16

@fhach
Copy link
Collaborator

fhach commented Jun 23, 2020

We have run pamir on 26 populations of 1000G and per sample this step takes a few hours on 64core machine.
Can you provide the following information:

What type of genome are these?

cat /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/001-pamir-remove-concordants/003-HLH-003_all_lanes_merged/003-HLH-003_all_lanes_merged.stats

Also

cat /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition.count

@christinafliege
Copy link
Author

Thank you for getting back to me!

It has currently been running for ~101 hours on the same step.

Here is the .stat and the partition.count
These are 3 human samples, about 100G/sample and using the full human reference genome.

Thanks again

cat /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/001-pamir-remove-concordants/003-HLH-003_all_lanes_merged/003-HLH-003_all_lanes_merged.stat

Total Number of Reads: 596068099
# of Primary Mappings: 596068099
  # of Supp. Mappings: 1677189

Original:
Concordant: 566771042
Discordant: 8519063
  Chimeric: 12116128
       OEA: 3024063
    Orphan: 5637803

Processed:
Concordant: 548535805
Discordant: 6585217
  Chimeric: 0
       OEA: 20629417
    Orphan: 19620997

Read Length Range: [100, 100]

TLEN:
Range: [40, 509]
 Mean: 26.93
  Std: 258.75

 cat /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/population/006-pamir-partition/003-HLH-003_all_lanes_merged/partition.count
2484157

@mortunco
Copy link
Contributor

Dear Christina,

Below I share one of the 1000g invidivuals. As you can see your sample has 2x more reads compared to 1000g. Also, if you take a look at discordant number you have almost 4x more reads. So with this increased coverage and less compute power it might take longer then in our case.

One interesting thing is TLEN stats. First thing is 26 bp fragment length is very short. To double check if everything is correct could you try the script below. Also minimum range was reported as 40 but the mean is 26. Lets double check it. Also which aligner did you use to map data? if so could you share the mapping command with the aligner version ?

Template Length Gathering code

 samtools view -F 260 bamfile.bam  | cut -f 9  | awk 'a=sqrt($1 * $1) {print a}' | sort -nr | uniq -c

NA19239

Total Number of Reads: 321073661
# of Primary Mappings: 321073661
  # of Supp. Mappings: 47007072
Original:
Concordant: 312055985
Discordant: 1889758
  Chimeric: 6463046
       OEA: 626303
    Orphan: 38569
Processed:
Concordant: 299257722
Discordant: 891451
  Chimeric: 0
       OEA: 11766639
    Orphan: 9157849
Read Length Range: [150, 150]
TLEN:
Range: [2, 936]
 Mean: 437.22
  Std: 102.44

Thank you very much,

T.

@christinafliege
Copy link
Author

When running your script I get a million lines which starts like so. Would you be able to help interpret this?
Thanks!


[cfliege2@iforge scripts]$ head -30 TLEN.out
      2 248936278
      2 248936211
      2 248936202
      2 248936196
      2 248936104
      2 248936079
      2 248935976
      2 248935975
      2 248935951
      2 248935904
      2 248935892
      2 248935870
      2 248935868
      2 248935860
      2 248935820
      2 248935769
      2 248935753
      2 248935716
      2 248935707
      2 248935698
      2 248935681
      2 248935672
      2 248935666
      2 248935663
      2 248935634
      2 248935622
      2 248935541
      2 248935437
      2 248935408
      2 248935397

@christinafliege
Copy link
Author

christinafliege commented Jun 24, 2020

4 lanes were aligned and then merged into a single BAM. Here is the command for how it was done. Thanks!

/projects/sciteam/baib/builds/sentieon/sentieon-genomics-201808.03/libexec/bwa mem -M -R @RG\tID:003-HLH-001.5\tPU:FCC21HHACXX.5\tSM:003-HLH-001\tPL:illumina\tLB:_HLHS -t 10 /projects/sciteam/baib/GATKbundle/Dec3_2017/Homo_sapiens_assembly38.fasta /scratch/sciteam/mkendzi2/extracted_fastqs/003-HLH-001_L5_read1.fq.gz /scratch/sciteam/mkendzi2/extracted_fastqs/003-HLH-001_L5_read2.fq.gz

@mortunco
Copy link
Contributor

Hi Christina,

The endless tlen code was my mistake. My apologies. Our aim was to double check the fragment size stats, i accidentally allowed other reads than properly paired ones that caused the million lines of TLENs.

samtools view -f2 -F260 YOURINITIAL.bam | cut -f 9 | awk '$1 > 0' | sort -nr | uniq -c 

I dont see a problem with the bwa mem side. So we can cross that step out.

@christinafliege
Copy link
Author

christinafliege commented Jun 26, 2020

Thank you! However, I have run the new code and get a similar but slightly different result. Again for more than a million lines.

   1 248936278
      1 248936211
      1 248936202
      1 248936196
      1 248936104
      1 248936079
      1 248935976
      1 248935975
      1 248935951
      1 248935904
      1 248935892
      1 248935870
      1 248935868
      1 248935860
      1 248935820
      1 248935769
      1 248935753
      1 248935716
      1 248935707
      1 248935698
      1 248935681
      1 248935672
      1 248935666
      1 248935663
      1 248935634
      1 248935622
      1 248935541
      1 248935437
      1 248935408

@christinafliege
Copy link
Author

Good Morning,
I ran another set of data with different bam files over the weekend and it is hanging in the same step.

Additionally, the file sizes in /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/007-pamir-assembly/004-HLH-004_all_lanes_merged are not changing and have had the same sizes for many hours.

Thank you for your assistance.

[Sat Jun 27 05:23:08 2020]
rule pamir_assemble_full_new:
    input: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/006-pamir-partition/004-HLH-004_all_lanes_merged/partition, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/006-pamir-partition/004-HLH-004_all_lanes_merged/partition.count, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/001-pamir-remove-concordants/004-HLH-004_all_lanes_merged/004-HLH-004_all_lanes_merged.stats.json
    output: /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/007-pamir-assembly/004-HLH-004_all_lanes_merged/all.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/007-pamir-assembly/004-HLH-004_all_lanes_merged/all_LOW_QUAL.vcf, /projects/mgc/Project_2/HLHS_BasilAniseVC/Pamiranalysis/4/007-pamir-assembly/004-HLH-004_all_lanes_merged/all.log
    jobid: 56
    wildcards: sample=004-HLH-004_all_lanes_merged
    threads: 20

Job counts:
        count   jobs
        1       pamir_assemble_full_new
        1

[cfliege2@iforge 004-HLH-004_all_lanes_merged]$ ls -lh
total 21G
-rw-r----- 1 cfliege2 bany 4.0M Jun 27 07:27 T0.log
-rw-r----- 1 cfliege2 bany 1.9K Jun 27 07:27 T0.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T0_DELS.vcf
-rw-r----- 1 cfliege2 bany  14K Jun 27 07:27 T0_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 5.3M Jun 27 07:27 T1.log
-rw-r----- 1 cfliege2 bany 3.9K Jun 27 07:27 T1.vcf
-rw-r----- 1 cfliege2 bany  54M Jun 27 07:29 T10.log
-rw-r----- 1 cfliege2 bany 7.7K Jun 27 07:29 T10.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T10_DELS.vcf
-rw-r----- 1 cfliege2 bany 198K Jun 27 07:29 T10_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  50M Jun 27 07:29 T11.log
-rw-r----- 1 cfliege2 bany 6.6K Jun 27 07:29 T11.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T11_DELS.vcf
-rw-r----- 1 cfliege2 bany 200K Jun 27 07:29 T11_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  86M Jun 27 07:31 T12.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T12.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T12_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T12_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T13.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T13.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T13_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T13_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 27 07:27 T14.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T14.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T14_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T14_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 27 07:27 T15.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T15.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T15_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T15_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 27 07:27 T16.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T16.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T16_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T16_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 27 07:27 T17.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T17.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T17_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T17_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany   24 Jun 27 07:27 T18.log
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T18.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T18_DELS.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T18_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T1_DELS.vcf
-rw-r----- 1 cfliege2 bany  37K Jun 27 07:27 T1_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 3.5M Jun 27 07:27 T2.log
-rw-r----- 1 cfliege2 bany  499 Jun 27 07:27 T2.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T2_DELS.vcf
-rw-r----- 1 cfliege2 bany  12K Jun 27 07:27 T2_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 4.2M Jun 27 07:27 T3.log
-rw-r----- 1 cfliege2 bany  615 Jun 27 07:27 T3.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T3_DELS.vcf
-rw-r----- 1 cfliege2 bany  18K Jun 27 07:27 T3_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 4.1M Jun 27 07:27 T4.log
-rw-r----- 1 cfliege2 bany 1.5K Jun 27 07:27 T4.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T4_DELS.vcf
-rw-r----- 1 cfliege2 bany  14K Jun 27 07:27 T4_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 4.4M Jun 27 07:27 T5.log
-rw-r----- 1 cfliege2 bany 3.2K Jun 27 07:27 T5.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T5_DELS.vcf
-rw-r----- 1 cfliege2 bany  20K Jun 27 07:27 T5_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany 4.6M Jun 27 07:27 T6.log
-rw-r----- 1 cfliege2 bany 3.0K Jun 27 07:27 T6.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T6_DELS.vcf
-rw-r----- 1 cfliege2 bany  25K Jun 27 07:27 T6_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  16M Jun 27 07:28 T7.log
-rw-r----- 1 cfliege2 bany 7.0K Jun 27 07:28 T7.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T7_DELS.vcf
-rw-r----- 1 cfliege2 bany 118K Jun 27 07:28 T7_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  83M Jun 27 07:32 T8.log
-rw-r----- 1 cfliege2 bany  30K Jun 27 07:32 T8.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T8_DELS.vcf
-rw-r----- 1 cfliege2 bany 390K Jun 27 07:32 T8_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  50M Jun 27 07:29 T9.log
-rw-r----- 1 cfliege2 bany 4.0K Jun 27 07:29 T9.vcf
-rw-r----- 1 cfliege2 bany    0 Jun 27 07:27 T9_DELS.vcf
-rw-r----- 1 cfliege2 bany 168K Jun 27 07:29 T9_LOW_QUAL.vcf
-rw-r----- 1 cfliege2 bany  21G Jun 27 07:27 all.log.tmp
-rw-r----- 1 cfliege2 bany 8.6M Jun 27 07:27 all.vcf.tmp
-rw-r----- 1 cfliege2 bany 119M Jun 27 07:27 all_LOW_QUAL.vcf.tmp

@fhach
Copy link
Collaborator

fhach commented Jun 29, 2020

@christinafliege, I would be more than happy to set up a zoom meeting to go through this faster if you like. Just email me and I send you a zoom link with @f0t1h and @mortunco. You can find my email in the paper.

Meanwhile, @mortunco and @f0t1h will continue to debug with sending you commands here.

@mortunco
Copy link
Contributor

Dear @christinafliege,

Thank you for sharing the TLEN distribution. The command that i shared gets the fragment length information from your bam file for only properly paired reads Its really interesting that we are seeing numbers like 248,936,211 in the fragment length distribution. I think this is causing problems throughout the pipeline. Our first guess was some tab ("\t") character in your chromosome name cause shifting all the columns in bam file.

Could you share the following output files with us ?

 samtools view -H  YOURINITIAL.bam  | grep "@SQ" > bam-chr-names
cat /projects/sciteam/baib/GATKbundle/Dec3_2017/Homo_sapiens_assembly38.fasta | grep ">" > fasta-chr-names

Also,

could you check what was your latest cluster id processed.

tail -n 1000 all.log.tmp | grep "Cluster\ ID"

it should return something like this

(base) [tmorova@linuxsrv006 1]$ tail -n 100 all.log | grep "Cluster\ ID"
 + Cluster ID      : 290
 + Cluster ID      : 291
 + Cluster ID      : 292
 + Cluster ID      : 293
 + Cluster ID      : 294
 + Cluster ID      : 295
 + Cluster ID      : 296
 + Cluster ID      : 297

Thank you for your input and patience,

Best,
T.

@christinafliege
Copy link
Author

The bam-chr-names is about 3000 lines long and starts as so:

@SQ     SN:chr1 LN:248956422
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717
@SQ     SN:chr10        LN:133797422
@SQ     SN:chr11        LN:135086622
@SQ     SN:chr12        LN:133275309
@SQ     SN:chr13        LN:114364328
@SQ     SN:chr14        LN:107043718
@SQ     SN:chr15        LN:101991189
@SQ     SN:chr16        LN:90338345
@SQ     SN:chr17        LN:83257441
@SQ     SN:chr18        LN:80373285
@SQ     SN:chr19        LN:58617616
@SQ     SN:chr20        LN:64444167
@SQ     SN:chr21        LN:46709983
@SQ     SN:chr22        LN:50818468
@SQ     SN:chrX LN:156040895
@SQ     SN:chrY LN:57227415
@SQ     SN:chrM LN:16569
@SQ     SN:chr1_KI270706v1_random       LN:175055
@SQ     SN:chr1_KI270707v1_random       LN:32032
@SQ     SN:chr1_KI270708v1_random       LN:127682
@SQ     SN:chr1_KI270709v1_random       LN:66860
@SQ     SN:chr1_KI270710v1_random       LN:40176
@SQ     SN:chr1_KI270711v1_random       LN:42210
@SQ     SN:chr1_KI270712v1_random       LN:176043
@SQ     SN:chr1_KI270713v1_random       LN:40745
@SQ     SN:chr1_KI270714v1_random       LN:41717
@SQ     SN:chr2_KI270715v1_random       LN:161471
@SQ     SN:chr2_KI270716v1_random       LN:153799
@SQ     SN:chr3_GL000221v1_random       LN:155397
@SQ     SN:chr4_GL000008v2_random       LN:209709
@SQ     SN:chr5_GL000208v1_random       LN:92689
@SQ     SN:chr9_KI270717v1_random       LN:40062
@SQ     SN:chr9_KI270718v1_random       LN:38054
@SQ     SN:chr9_KI270719v1_random       LN:176845
@SQ     SN:chr9_KI270720v1_random       LN:39050
@SQ     SN:chr11_KI270721v1_random      LN:100316
@SQ     SN:chr14_GL000009v2_random      LN:201709
@SQ     SN:chr14_GL000225v1_random      LN:211173

The reference cat is also about 3000 lines and starts as so.

>chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
>chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
>chr4  AC:CM000666.2  gi:568336020  LN:190214555  rl:Chromosome  M5:3210fecf1eb92d5489da4346b3fddc6e  AS:GRCh38
>chr5  AC:CM000667.2  gi:568336019  LN:181538259  rl:Chromosome  M5:a811b3dc9fe66af729dc0dddf7fa4f13  AS:GRCh38  hm:47309185-49591369
>chr6  AC:CM000668.2  gi:568336018  LN:170805979  rl:Chromosome  M5:5691468a67c7e7a7b5f2a3a683792c29  AS:GRCh38
>chr7  AC:CM000669.2  gi:568336017  LN:159345973  rl:Chromosome  M5:cc044cc2256a1141212660fb07b6171e  AS:GRCh38
>chr8  AC:CM000670.2  gi:568336016  LN:145138636  rl:Chromosome  M5:c67955b5f7815a9a1edfaa15893d3616  AS:GRCh38
>chr9  AC:CM000671.2  gi:568336015  LN:138394717  rl:Chromosome  M5:6c198acf68b5af7b9d676dfdd531b5de  AS:GRCh38
>chr10  AC:CM000672.2  gi:568336014  LN:133797422  rl:Chromosome  M5:c0eeee7acfdaf31b770a509bdaa6e51a  AS:GRCh38
>chr11  AC:CM000673.2  gi:568336013  LN:135086622  rl:Chromosome  M5:1511375dc2dd1b633af8cf439ae90cec  AS:GRCh38
>chr12  AC:CM000674.2  gi:568336012  LN:133275309  rl:Chromosome  M5:96e414eace405d8c27a6d35ba19df56f  AS:GRCh38
>chr13  AC:CM000675.2  gi:568336011  LN:114364328  rl:Chromosome  M5:a5437debe2ef9c9ef8f3ea2874ae1d82  AS:GRCh38
>chr14  AC:CM000676.2  gi:568336010  LN:107043718  rl:Chromosome  M5:e0f0eecc3bcab6178c62b6211565c807  AS:GRCh38  hm:multiple
>chr15  AC:CM000677.2  gi:568336009  LN:101991189  rl:Chromosome  M5:f036bd11158407596ca6bf3581454706  AS:GRCh38
>chr16  AC:CM000678.2  gi:568336008  LN:90338345  rl:Chromosome  M5:db2d37c8b7d019caaf2dd64ba3a6f33a  AS:GRCh38
>chr17  AC:CM000679.2  gi:568336007  LN:83257441  rl:Chromosome  M5:f9a0fb01553adb183568e3eb9d8626db  AS:GRCh38
>chr18  AC:CM000680.2  gi:568336006  LN:80373285  rl:Chromosome  M5:11eeaa801f6b0e2e36a1138616b8ee9a  AS:GRCh38
>chr19  AC:CM000681.2  gi:568336005  LN:58617616  rl:Chromosome  M5:85f9f4fc152c58cb7913c06d6b98573a  AS:GRCh38  hm:multiple
>chr20  AC:CM000682.2  gi:568336004  LN:64444167  rl:Chromosome  M5:b18e6c531b0bd70e949a7fc20859cb01  AS:GRCh38
>chr21  AC:CM000683.2  gi:568336003  LN:46709983  rl:Chromosome  M5:974dc7aec0b755b19f031418fdedf293  AS:GRCh38  hm:multiple
>chr22  AC:CM000684.2  gi:568336002  LN:50818468  rl:Chromosome  M5:ac37ec46683600f808cdd41eac1d55cd  AS:GRCh38  hm:multiple
>chrX  AC:CM000685.2  gi:568336001  LN:156040895  rl:Chromosome  M5:2b3a55ff7f58eb308420c8a9b11cac50  AS:GRCh38

THe  last cluster ID is 
  • Cluster ID : 2678949
  • Cluster ID : 2678950
  • Cluster ID : 2678951
  • Cluster ID : 2678952
  • Cluster ID : 2678953
  • Cluster ID : 2678954
  • Cluster ID : 2678955
  • Cluster ID : 2678956
  • Cluster ID : 2678957
  • Cluster ID : 2678958
  • Cluster ID : 2678959
  • Cluster ID : 2678960
  • Cluster ID : 2678961
  • Cluster ID : 2678962
  • Cluster ID : 2678963
  • Cluster ID : 2678964
  • Cluster ID : 2678965
  • Cluster ID : 2678966
  • Cluster ID : 2678967
  • Cluster ID : 2678968
  • Cluster ID : 2678969
  • Cluster ID : 2678970
  • Cluster ID : 2678971
  • Cluster ID : 2678972
  • Cluster ID : 2678973
  • Cluster ID : 2678974
  • Cluster ID : 2678975
  • Cluster ID : 2678976
  • Cluster ID : 2678977
  • Cluster ID : 2678978
  • Cluster ID : 2678979
  • Cluster ID : 2678980
  • Cluster ID : 2678981
  • Cluster ID : 2678982
  • Cluster ID : 2678983
  • Cluster ID : 2678984
  • Cluster ID : 2678985
  • Cluster ID : 2678986
  • Cluster ID : 2678987
  • Cluster ID : 2678988
  • Cluster ID : 2678989
  • Cluster ID : 2678990
  • Cluster ID : 2678991
  • Cluster ID : 2678992
  • Cluster ID : 2678993
  • Cluster ID : 2678994
  • Cluster ID : 2678995
  • Cluster ID : 2678996
  • Cluster ID : 2678997
  • Cluster ID : 2678998
  • Cluster ID : 2678999
  • Cluster ID : 2679000

@fhach
Copy link
Collaborator

fhach commented Jun 30, 2020

@christinafliege is this step completed?

@christinafliege
Copy link
Author

We let it run a bit more, but then based on our needs and resources we killed it and changed the hardcoded 7000 to a 2000, as well as providing a bed file of centromeres and started it back up again. I will update you as this continues, thank you very much for your help!

@christinafliege
Copy link
Author

christinafliege commented Jul 6, 2020

Good Afternoon,

We after restarting it on on Tuesday using the hardcoded change and a provided centromere file Pamir ran on our data all weekend. However, I used the commands you provided to check the partition count as well as the current cluster ID that it was currently working on. This showed that it has been chugging along on the same cluster ID since Wednesday, which was the same one that it was working on when we had our call last week. Cluster ID : 2679000, as shown above.

Could you help us to move past this blocker?

Additionally, do you think that changing the hardcoded top of the read interval from 2000 down additionally to 1000 would make any difference?

Thank you!

@mortunco
Copy link
Contributor

mortunco commented Jul 7, 2020

Hi Christina,

If you mean this setting, cfg_default("pamir_partition_per_thread",1000), the default is originally 1000.

About that issue i need a little bit help from you. In pamir assembly step, each previously determined partition processed. There is obviously some wonky stuff going on that partition 2679000 so lets see how the reads look like in that partition.

partition log to see brief summary of reads. problematic partition and previous partitions log should be there.
tail -n 35 006-pamir-partition/YOURINDID/partition.log > problematic.partition.log

Now lets get the reads. The directory where you installed pamir directory like below. In there there is a pamir executable. please run the following command there to extract reads related with this cluster. This command will give you last 6 clusters (including problematic one) + the first cluster in the partition. If we can get those reads, we can replicate the problem on our side.

(base) [tmorova@linuxsrv006 pamir]$ ll
total 5120
-rwxrwxrwx 1 tmorova hachgrp     544 Jan  6  2020 check_slurm_hpc.py
-rw-rw-r-- 1 tmorova hachgrp    3515 Mar 11 18:25 cluster.json
-rw-rw-r-- 1 tmorova hachgrp     328 Jan  6  2020 environment.yaml
drwxrwsr-x 4 tmorova hachgrp    4096 Dec 19  2019 ext
-rw-rw-r-- 1 tmorova hachgrp    1602 Dec 19  2019 LICENSE
-rwxrwxr-x 1 tmorova hachgrp    3673 Mar 12 16:22 Makefile
drwxrwsr-x 4 tmorova hachgrp    4096 May 28 09:33 pamir <----- THIS ONE 
drwxrwsr-x 2 tmorova hachgrp    4096 Mar 12 16:23 pamir-obj
-rwxrwxr-x 1 tmorova hachgrp    2951 Apr 20 09:34 pamir.sh
-rw-rw-r-- 1 tmorova hachgrp    8852 Dec 19  2019 README.md
drwxrwsr-x 2 tmorova hachgrp    4096 Mar 12 16:22 scripts
-rw-rw-r-- 1 tmorova hachgrp   56120 Mar 13 14:11 Snakefile
drwxrwsr-x 3 tmorova hachgrp    4096 Mar 12 16:23 src

I dont know how is your pamir github reposityory was set. So pamir executable in pamir directory has a function to extract reads for spesific partition.

following command:

cd path/to/your/analysis/POPULATION/006-pamir-partition/INDIVIDUAL
full/path/to/pamir/pamir  get_cluster partition 2678995-2679001

once you get them could you share the files from pamir get_cluster so that we can debug it.

Here is the link to share files.
https://www.dropbox.com/request/sap9T2Od58FJKU8rZKsS

Best,
T.

@mortunco
Copy link
Contributor

Dear @christinafliege,

Were you able to fix the error or generate the pamir partition files?

Let us know if we can do anything to fix this issue.

T.

@christinafliege
Copy link
Author

christinafliege commented Jul 13, 2020

@mortunco

Sorry for the slow response, many projects are going right now.

From your previous post "If you mean this setting, cfg_default("pamir_partition_per_thread",1000), the default is originally 1000."

What we did was change linle 245 as follows

if ( p.size() > 7000 || p.size() <= 2 ) {

to

if ( p.size() > 2000 || p.size() <= 2 ) {

I will work to get the parition files this afternoon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants