You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for developing this software. I was looking at the results in the folder aligned_gene_sequences/ , and I realised that some of those MSA contain fragmented genes (I changed the names from aln.fas to fasta):
In this example you can see that both GCF_030440475.1_ASM3044047v1 and GCF_030440495.1_ASM3044049v1 have two sequences for the same gene, probably because there is a gap between those two sequences. I'm wondering, does Panaroo offer some script to merge these two fragmented sequences into just one? I want to build specific MSAs from set of genes but before trying to make a homemade script to handle this I was thinking that maybe there is already a script that can do this?
Thanks!
The text was updated successfully, but these errors were encountered:
Hello there,
thank you for developing this software. I was looking at the results in the folder aligned_gene_sequences/ , and I realised that some of those MSA contain fragmented genes (I changed the names from aln.fas to fasta):
$ cat nfrA2.fasta
>GCF_028335125.1_ASM2833512v1
atgaaacagattcctcaagattttcgtttgatagaagatttcttccgcacgcgcagatccgtacgcaagtttatcgatcgtcctgtggaggaagagaagttgatggccatcctcgaagccggacgcatagctccttcggcacataattaccagccgtggcatttcctcgtggtcagagaagaagagggccgcaaacgcttggctccctgttcccaacaaccttggttcccgggtgcccccatctatatcatcacgcttggcgatcatcaaagagcatggaagcgaggagcaggcgattccgtagacatcgatacctctatcgccatgacttatatgatgctggaagcacatagtctgggacttggatgtacgtgggtctgtgctttcgatcaagctctttgttcggagatcttcgacatcccttcgcacatgacacctgtttccatattggctctcggctatggcgatccgaccgtacctccgcgtgaggctttcaatcgcaaatccatcgaagaggtagtcagcttcgagaaattatga
>GCF_030144345.1_ASM3014434v1
atgaaacagattcctcaagattttcgtttgatagaagatttcttccgcacgcgcagatccgtacgcaagtttatcgatcgtcctgtggaggaagagaagttgatggccatcctcgaagccggacgcatagctccttcggcacataattaccagccgtggcatttcctcgtggtcagagaagaagagggccgcaaacgcttggctccctgttcccaacaaccttggttcccgggtgcccccatctatatcatcacgcttggcgatcatcaaagagcatggaagcgaggagcaggcgattccgtagacatcgatacctctatcgccatgacttatatgatgctggaagcacatagtctgggacttggatgtacgtgggtctgtgctttcgatcaagctctttgttcggagatcttcgacatcccttcgcacatgacacctgtttccatattggctctcggctatggcgatccgaccgtacctccgcgtgaggctttcaatcgcaaatccatcgaagaggtagtcagcttcgagaaattatga
>GCF_030252365.1_ASM3025236v1
atgaaacagattcctcaagattttcgtttgatagaagatttcttccgcacgcgcagatccgtacgcaagtttatcgatcgtcctgtggaggaagagaagttgatggccatcctcgaagccggacgcatagctccttcggcacataattaccagccgtggcatttcctcgtggtcagagaagaagagggccgcaaacgcttggctccctgttcccaacaaccttggttcccgggtgcccccatctatatcatcacgcttggcgatcatcaaagagcatggaagcgaggagcaggcgattccgtagacatcgatacctctatcgccatgacttatatgatgctggaagcacatagtctgggacttggatgtacgtgggtctgtgctttcgatcaagctctttgttcggagatcttcgacatcccttcgcacatgacacctgtttccatattggctctcggctatggcgatccgaccgtacctccgcgtgaggctttcaatcgcaaatccatcgaagaggtagtcagcttcgagaaattatga
>GCF_030440475.1_ASM3044047v1
atgaaacagattcctcaagattttcgtttgatagaagatttcttccgcacgcgcagatccgtacgcaagtttatcgatcgtcctgtggaggaagagaagttgatggccatcctcgaagccggacgcatagctccttcggcacataattaccagccgtga---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>GCF_030440475.1_ASM3044047v1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------gtggtcagagaagaagagggccgcaaacgcttggctccctgttcccaacaaccttggttcccgggtgcccccatctatatcatcacgcttggcgatcatcaaagagcatggaagcgaggagcgggcgattcggtagacatcgatacctctatcgccatgacttatatgatgctggaagcacatagtctgggacttggatgtacgtgggtctgtgctttcgatcaagctctttgttcggagatcttcgacatcccttcgcacatgacacctgtttccatattggctctcggctatggcgatccgaccgtacctccgcgtgaggctttcaatcgcaaatccatcgaagaggtagtcagcttcgagaaattatga
>GCF_030440495.1_ASM3044049v1
atgaaacagattcctcaagattttcgtttgatagaagatttcttccgcacgcgcagatccgtacgcaagtttatcgatcgtcctgtggaggaagagaagttgatggccatcctcgaagccggacgcatagctccttcggcacataattaccagccgtga---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>GCF_030440495.1_ASM3044049v1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------gtggtcagagaagaagagggccgcaaacgcttggctccctgttcccaacaaccttggttcccgggtgcccccatctatatcatcacgcttggcgatcatcaaagagcatggaagcgaggagcgggcgattcggtagacatcgatacctctatcgccatgacttatatgatgctggaagcacatagtctgggacttggatgtacgtgggtctgtgctttcgatcaagctctttgttcggagatcttcgacatcccttcgcacatgacacctgtttccatattggctctcggctatggcgatccgaccgtacctccgcgtgaggctttcaatcgcaaatccatcgaagaggtagtcagcttcgagaaattatga
In this example you can see that both GCF_030440475.1_ASM3044047v1 and GCF_030440495.1_ASM3044049v1 have two sequences for the same gene, probably because there is a gap between those two sequences. I'm wondering, does Panaroo offer some script to merge these two fragmented sequences into just one? I want to build specific MSAs from set of genes but before trying to make a homemade script to handle this I was thinking that maybe there is already a script that can do this?
Thanks!
The text was updated successfully, but these errors were encountered: