You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I stumbled upon a small issue when I was blindly following the instructions to gain viral marker genes (HIV in my case).
It seems the "clean_fasta_cdna_cds.py" file does not sufficiently clean the names as I had issues downstream due to underscores "_". Resulting in "Keyerrors" at various steps. For example when generating the references.
Although, it may be that I misunderstood the instructions, after manually removing all underscores it was resolved.
But this is an example of the error:
Example name: "02495|KC156214.1_AGF30950.1_2 [02495]"
Error at reference-generation
(I actually could fix this with split "OG" instead of "" in lines 326-328 of "OGSet.py" but then I had errors at the final merging step):
`read2tree --standalone_path marker_genes/ --reference --dna_reference all_cdna_out.fa
--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes ---
Loading files for pre-filter: 100%|███████████| 9/9 [00:00<00:00, 8355.19 OGs/s]
2023-04-24 10:07:05,211 - read2tree.OGSet - INFO -
--- Load ogs and find their corresponding DNA seq from all_cdna_out.fa ---
2023-04-24 10:07:05,211 - read2tree.OGSet - INFO - Loading all_cdna_out.fa into memory. This might take a while . . .
Loading OGs: 0%| | 0/9 [00:00<?, ? OGs/s]
Traceback (most recent call last):
File "/Users/mz/opt/anaconda3/envs/r2t/bin/read2tree", line 16, in <module>
main(sys.argv[1:], exe_name=exe_name(), desc=desc)
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 289, in main
ogset = OGSet(args, oma_output=oma_output, progress=progress) # Generate the OGs with their DNA sequences
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 79, in __init__
self.ogs = self._load_ogs()
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 186, in _load_ogs
ogs[name].dna = self._get_dna_records(ogs[name].aa,
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 365, in _get_dna_records
og_cdna.append(self._get_dna_from_fasta(record, db))
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 326, in _get_dna_from_fasta
return self._get_dna_from_REST(record)
File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 282, in _get_dna_from_REST
seq = oma_record.json()['cdna']
KeyError: 'cdna'`
I've just updated the code which you can download from here. So it doesn't affect the read2tree installation. I tested the new version with the provided assembly and it is working. Please make sure that you remove the output from previous run and let me know whether it works for you. And I'm sorry for the inconvenience.
Hi,
thanks for the great tool.
I stumbled upon a small issue when I was blindly following the instructions to gain viral marker genes (HIV in my case).
It seems the "clean_fasta_cdna_cds.py" file does not sufficiently clean the names as I had issues downstream due to underscores "_". Resulting in "Keyerrors" at various steps. For example when generating the references.
Although, it may be that I misunderstood the instructions, after manually removing all underscores it was resolved.
But this is an example of the error:
Example name: "02495|KC156214.1_AGF30950.1_2 [02495]"
Error at reference-generation
(I actually could fix this with split "OG" instead of "" in lines 326-328 of "OGSet.py" but then I had errors at the final merging step):
Original files:
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Human_immunodeficiency_virus_1/all_assembly_versions/GCA_003202495.1_ASM320249v1/GCA_003202495.1_ASM320249v1_translated_cds.faa.gz
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Human_immunodeficiency_virus_1/all_assembly_versions/GCA_003202495.1_ASM320249v1/GCA_003202495.1_ASM320249v1_cds_from_genomic.fna.gz
The text was updated successfully, but these errors were encountered: