Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using virus.msh for HSV-1 assembly polishing and having less than 5 related-genomes for polish #41

Open
rezaeir opened this issue Oct 20, 2021 · 9 comments

Comments

@rezaeir
Copy link

rezaeir commented Oct 20, 2021

I am trying to use homopolish to improve the assembly consensus after using Raven for assembly and Medaka for primary polishing. However, when I use the virus.msh file and input my consensus.fasta file as input, the output is that:
``

homopolish polish -m R10.3.pkl -a consensus.fa -s virus.msh -t 10 -o hpOut
[2021/10/20 13:51] INFO: RUN-ID: consensus
consensus
/home/rezaeir/sequencing_data/121021_FusionHSVLibrary/fastq/fuHSV7/hpOut/debug
[2021/10/20 13:51] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 0 SECS.
This contig consensus closely-related genome is less than 5, not to polish...
TIME Total: 0 MINS 0 SECS.
``

Is there any way that I can fix this? Also, I tried using the "-g" option with "humanalphaherpesvirinae_humanalphaherpesvirus1" as the genius_species input but I am not sure if this is the right way to write it (the result did not have any changes compared to Medaka's output).

@ythuang0522
Copy link
Owner

@rezaeir Can you provide one HSV-1 genome to us? First, we found the number of HSV-1 is insufficient in current virus sketch version and would like to update it. Second, the -g option was designed for bacteria but can be revised for virus. Finally, we are considering pulling more genomes from NCBI virus instead of RefSeq. Would be great if you can provide one for revising and testing.

@rezaeir
Copy link
Author

rezaeir commented Oct 21, 2021

@ythuang0522 Unfortunately the number of HSV-1 genomes that are in NCBI is very limited. This https://www.ncbi.nlm.nih.gov/nuccore/NC_001806.2?report=fasta is refSeq file that you can access. Also I attached the file that I generated using ViPR containing more than 80 full genomes of HSV-
HSV1_ViPR_DB.zip
1.

@rezaeir
Copy link
Author

rezaeir commented Oct 21, 2021

I used that ViPR database and it seems that it could marginally decrease the number of gaps (from 126 to 117).

@ythuang0522
Copy link
Owner

Thanks for your response. I mean if you can provide the the viral genome after Medaka polishing for developing. I was not aware of ViPR. It looks to me you may use the -l (local database) for polishing. However it's still lacking the ANI selection step. We are considering adding this into the local DB version. As we don't have ONT viral genome at hand, would be better if you could provide one.

@rezaeir
Copy link
Author

rezaeir commented Nov 17, 2021

Sorry for the very late response. The following file is my sequencing with Minion R9.4 of HSV genome with a GFP insertion in its Tk gene locus.
RR-tkHSV.raven.medaka.zip

@steinbrl
Copy link

I have the same problem. I like to to polish HSV-1, HSV-2, VZV, KSHV and HCMV-assemblies. I started with HSV-2, and it directly failed. Same issue...

@SeaneryChang
Copy link
Collaborator

@rezaeir
We have polished the virus by -l (local database) with your HSV DB and tested some thresholds.
Mismatch and insdel are accessed by fastmer (compare pre-polish and after-polish files).
homopolished_1 is the default result which equals yours and we are curious how you got the gap.
We would appreciate it if you could provide a reference of the virus for us to adjust our program.

mismatch insertion deletion
homopolished_1 0 75 65
homopolished_2 0 66 62
homopolished_3 0 72 73

hsv.zip

@SeaneryChang
Copy link
Collaborator

@steinbrl
Hi, you can use -l (local database) to polish if you have the virus database.
If the program can't find the closer virus in our database, it would skip it because of the insufficient homogeneous virus.
It would be great if you can provide your assemblies and database(if you have one).

@rezaeir
Copy link
Author

rezaeir commented Dec 4, 2022

@rezaeir We have polished the virus by -l (local database) with your HSV DB and tested some thresholds. Mismatch and insdel are accessed by fastmer (compare pre-polish and after-polish files). homopolished_1 is the default result which equals yours and we are curious how you got the gap. We would appreciate it if you could provide a reference of the virus for us to adjust our program.

mismatch insertion deletion
homopolished_1 0 75 65
homopolished_2 0 66 62
homopolished_3 0 72 73
hsv.zip

Hi, I've attached a reference fasta file from HSV-GFP which is an assembly from very high depth short read sequencing. I was wondering if you plan to add an internal virus database maybe based on NCBI virus?
hsv1-gfp-genome.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants