Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cactus-pangenome fails at make_vcf step only on real data #1416

Closed
amsession opened this issue Jun 14, 2024 · 3 comments · Fixed by #1421
Closed

Cactus-pangenome fails at make_vcf step only on real data #1416

amsession opened this issue Jun 14, 2024 · 3 comments · Fixed by #1421

Comments

@amsession
Copy link

I am trying to run the cactus-pangenome algorithm and was able to successfully run on the example data set, however when trying to use a single chromosome of real data with just 2 species the algorithm seems to fail at the "make_vcf" step. I am unsure of how to interpret the log file beyond that.

The log file is attached, and the exact command used was "apptainer exec ~/LOCAL.INSTALL/cactus/cactus_v2.8.3.sif cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --maxCores 32 --restart" . This was the latest log file after trying to restart with more maxCores.

error_log5.txt

@glennhickey
Copy link
Collaborator

Hi. This looks very similar to the issue in #1402 in that it appears vg deconstruct is writing a line with no sample information

[E::bcf_write] Broken VCF record, the number of columns at Chr1L:30057051 does not match the number of samples (0 vs 1)

Are you able to share the input data with me so I can try to reproduce? Failing that, if you could share the contents of /XlaXpe.Chr1L.txt that may help a bit. Thanks

@amsession
Copy link
Author

Unfortunately both fasta files are too large to share here even after compression (25MB limit). This is attempting to align sequences Chr1L sequences between Xenopus laevis v10 genome here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_017654675.1/ , and Xenopus petersii paternal assembly here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_038501925.1/ . "Chr1L" in X. laevis, "1L" in petersii. There are massive misassemblies in the maternal assembly so that should not be used. If there is an easier way to share the fastas I have directly please let me know. The .txt file is attached.

XlaXpe.Chr1L.txt

@glennhickey
Copy link
Collaborator

Thanks!! I was able to reproduce it. Will fix asap. For the record, these are the commands I used (using v2.8.3)

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/017/654/675/GCF_017654675.1_Xenopus_laevis_v10.1/GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/038/501/925/GCA_038501925.1_aXenPet1.paternal.cur/GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz

gzip -d GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna.gz
gzip -d GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna.gz

mkdir -p ./XlaChr
mkdir -p ./XpeChr

samtools faidx GCF_017654675.1_Xenopus_laevis_v10.1_genomic.fna NC_054371.1 >  ./XlaChr/Chr1L.fa
samtools faidx GCA_038501925.1_aXenPet1.paternal.cur_genomic.fna CM076672.1 >  ./XpeChr/1L.fa

printf "Xla ./XlaChr/Chr1L.fa\n" > XlaXpe.Chr1L.txt
printf "Xpe ./XpeChr/1L.fa\n" >> XlaXpe.Chr1L.txt

TOIL_SLURM_ARGS="--partition=long --time=8000" cactus-pangenome ./js ./XlaXpe.Chr1L.txt --outDir Chr1L --outName Chr1L --reference Xla --vcf --giraffe --gfa --gbz --consCores 32 --batchSystem slurm --logFile Chr1L.log --indexCores 32 --mgCores 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants