Cactus-pangenome fails at make_vcf step only on real data #1416

amsession · 2024-06-14T22:43:34Z

I am trying to run the cactus-pangenome algorithm and was able to successfully run on the example data set, however when trying to use a single chromosome of real data with just 2 species the algorithm seems to fail at the "make_vcf" step. I am unsure of how to interpret the log file beyond that.

The log file is attached, and the exact command used was "apptainer exec ~/LOCAL.INSTALL/cactus/cactus_v2.8.3.sif cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --maxCores 32 --restart" . This was the latest log file after trying to restart with more maxCores.

error_log5.txt

glennhickey · 2024-06-17T14:11:27Z

Hi. This looks very similar to the issue in #1402 in that it appears vg deconstruct is writing a line with no sample information

[E::bcf_write] Broken VCF record, the number of columns at Chr1L:30057051 does not match the number of samples (0 vs 1)

Are you able to share the input data with me so I can try to reproduce? Failing that, if you could share the contents of /XlaXpe.Chr1L.txt that may help a bit. Thanks

amsession · 2024-06-17T15:54:09Z

Unfortunately both fasta files are too large to share here even after compression (25MB limit). This is attempting to align sequences Chr1L sequences between Xenopus laevis v10 genome here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_017654675.1/ , and Xenopus petersii paternal assembly here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_038501925.1/ . "Chr1L" in X. laevis, "1L" in petersii. There are massive misassemblies in the maternal assembly so that should not be used. If there is an easier way to share the fastas I have directly please let me know. The .txt file is attached.

XlaXpe.Chr1L.txt

glennhickey · 2024-06-17T19:34:19Z

Thanks!! I was able to reproduce it. Will fix asap. For the record, these are the commands I used (using v2.8.3)

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/017/654/675/GCF_017654675.1_Xenopus_laevis_v10.1/GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/038/501/925/GCA_038501925.1_aXenPet1.paternal.cur/GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz

gzip -d GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
gzip -d GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz

mkdir -p ./XlaChr
mkdir -p ./XpeChr

samtools faidx GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna NC_054371.1 >  ./XlaChr/Chr1L.fa
samtools faidx GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna CM076672.1 >  ./XpeChr/1L.fa

printf "Xla ./XlaChr/Chr1L.fa\n" > XlaXpe.Chr1L.txt
printf "Xpe ./XpeChr/1L.fa\n" >> XlaXpe.Chr1L.txt

TOIL_SLURM_ARGS="--partition=long --time=8000" cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --consCores 32 --batchSystem slurm --logFile Chr1L.log --indexCores 32 --mgCores 32

glennhickey mentioned this issue Jun 17, 2024

use bcftools view to sanity check every vcf #1417

Merged

This was referenced Jun 18, 2024

vc deconstruct error - more sample names in header than sample fields pangenome/pggb#287

Closed

update vcfbub to v0.1.1 #1421

Merged

glennhickey closed this as completed in #1421 Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cactus-pangenome fails at make_vcf step only on real data #1416

Cactus-pangenome fails at make_vcf step only on real data #1416

amsession commented Jun 14, 2024

glennhickey commented Jun 17, 2024

amsession commented Jun 17, 2024

glennhickey commented Jun 17, 2024

Cactus-pangenome fails at make_vcf step only on real data #1416

Cactus-pangenome fails at make_vcf step only on real data #1416

Comments

amsession commented Jun 14, 2024

glennhickey commented Jun 17, 2024

amsession commented Jun 17, 2024

glennhickey commented Jun 17, 2024