-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAM file formatting #845
Comments
If I understand this correctly, this is a nucleotide-vs-protein search (translated search), right? I think we have a bug for the SAM backtraces for this mode, where either the back trace should be converted to the nucleotide match count (not codon match count), or the nucleotide sequence should be translated to a protein sequence. |
Do you expect this search to return a nucleotide sequence or a protein sequence? |
@milot-mirdita, we are also interested in the SAM formatted output for translated searches. In our use case, we would like to get a nucleotide output. |
I'm searching a metagenome (nucleotides) against a protein catalogue. I need to know how many reads mapped to each protein from the catalogue for functional analysis. I guess I expect the search to return the matches for each read? |
I pushed a fix that should correct some of the weirdness with SAM. Please check if this is what you expected as we don't use much sam internally.
|
Thanks! Do I need to download and recompile mmseqs? or can I just replace with the fixed file? |
You can download precompiled binaries from: https://mmseqs.com/latest/ you don’t need to compile them yourself |
Expected Behavior
Hi,
I'm trying to convert a SAM file created from a search result but I get this error:
(mmseqs) moshea@shannon:/mnt/scratch$ samtools view -Sb SRR5234505.sam > SRR5234505.bam
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1_sam] Parse error at line 363193
samtools view: error reading file "SRR5234505.sam"
This is the part of the file where the error refers to:
(mmseqs) moshea@shannon:/mnt/scratch$ sed -n '363193p' SRR5234505.sam
SRR5234505.241048 16 NZ_SMKV01000006.1_67 26 254 35M * 0 0 GGTCTTACCTGCGCTCCACAGGCTGTCACCGCCAGCCCGGCGGCCTTCGCGACGTTGACCGCCGCGTTGATGTCCCGGTCCAGGTACGTCCCGCACGACCCGCAC * AS:i:98 NM:i:12
Maybe the sam file is not formatted properly?
I used:
mmseqs convertalis SRR5234505_DB curated_protein_catalogue SRR5234505_Result SRR5234505 --format-mode 1
Thanks for your help!
The text was updated successfully, but these errors were encountered: