Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record name cleaning #20

Closed
M-Zeeb opened this issue Apr 24, 2023 · 2 comments
Closed

record name cleaning #20

M-Zeeb opened this issue Apr 24, 2023 · 2 comments

Comments

@M-Zeeb
Copy link

M-Zeeb commented Apr 24, 2023

Hi,

thanks for the great tool.

I stumbled upon a small issue when I was blindly following the instructions to gain viral marker genes (HIV in my case).
It seems the "clean_fasta_cdna_cds.py" file does not sufficiently clean the names as I had issues downstream due to underscores "_". Resulting in "Keyerrors" at various steps. For example when generating the references.
Although, it may be that I misunderstood the instructions, after manually removing all underscores it was resolved.

But this is an example of the error:

Example name: "02495|KC156214.1_AGF30950.1_2 [02495]"

Error at reference-generation
(I actually could fix this with split "OG" instead of "" in lines 326-328 of "OGSet.py" but then I had errors at the final merging step):

`read2tree  --standalone_path  marker_genes/  --reference --dna_reference  all_cdna_out.fa  

--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes ---

Loading files for pre-filter: 100%|███████████| 9/9 [00:00<00:00, 8355.19 OGs/s]
2023-04-24 10:07:05,211 - read2tree.OGSet - INFO - 

--- Load ogs and find their corresponding DNA seq from all_cdna_out.fa ---

2023-04-24 10:07:05,211 - read2tree.OGSet - INFO - Loading all_cdna_out.fa into memory. This might take a while . . . 
Loading OGs:   0%|                                      | 0/9 [00:00<?, ? OGs/s]

Traceback (most recent call last):

  File "/Users/mz/opt/anaconda3/envs/r2t/bin/read2tree", line 16, in <module>
    main(sys.argv[1:], exe_name=exe_name(), desc=desc)
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 289, in main
    ogset = OGSet(args, oma_output=oma_output, progress=progress)  # Generate the OGs with their DNA sequences
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 79, in __init__
    self.ogs = self._load_ogs()
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 186, in _load_ogs
    ogs[name].dna = self._get_dna_records(ogs[name].aa,
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 365, in _get_dna_records
    og_cdna.append(self._get_dna_from_fasta(record, db))
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 326, in _get_dna_from_fasta
    return self._get_dna_from_REST(record) 
    
  File "/Users/mz/opt/anaconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 282, in _get_dna_from_REST
    seq = oma_record.json()['cdna']
    
KeyError: 'cdna'`

Original files:
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Human_immunodeficiency_virus_1/all_assembly_versions/GCA_003202495.1_ASM320249v1/GCA_003202495.1_ASM320249v1_translated_cds.faa.gz
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Human_immunodeficiency_virus_1/all_assembly_versions/GCA_003202495.1_ASM320249v1/GCA_003202495.1_ASM320249v1_cds_from_genomic.fna.gz

sinamajidian added a commit that referenced this issue Apr 24, 2023
@sinamajidian
Copy link
Contributor

Dear @M-Zeeb

I've just updated the code which you can download from here. So it doesn't affect the read2tree installation. I tested the new version with the provided assembly and it is working. Please make sure that you remove the output from previous run and let me know whether it works for you. And I'm sorry for the inconvenience.

Regards,
Sina

@M-Zeeb
Copy link
Author

M-Zeeb commented Apr 25, 2023

Dear Sina,

thanks for the quick response!
It works now.

Best,
Marius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants