Skip to content

Commit

Permalink
Update 2.1.2_host_prediction_II.md
Browse files Browse the repository at this point in the history
  • Loading branch information
maltesie authored Sep 20, 2024
1 parent bf71b4f commit 5b87b13
Showing 1 changed file with 17 additions and 4 deletions.
21 changes: 17 additions & 4 deletions _episodes/2.1.2_host_prediction_II.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@ The tool predicts proteins in each viral sequence and then assigns them to a set
groups. The annotated proteins are then used to predict a host genus with a pretrained random
forest model. This model matches the set of proteins with host proteins it was trained on.

After running RaFAH, we will compare the resulting prediction with the bacterial metagenome you
briefly analyzed in one of last weeks homeworks. The bacterial metagenome was taxonomically annotated
using GTDB. We will get into taxonomic classification of viruses tomorrow, for today it is important
to know that there are different methods of taxonomic classification. RaFAH uses the NCBI taxonomy,
so we will have to translate the output of RaFAH into the GTDB taxonomy to be able to compare the
predictions with our bacterial metagenome.

In the end of today's practical part, we will use BLAST as an alternative way of linking viruses to
their hosts. With BLAST, we will map our viral contigs to the bacterial metagenome to find homologous
regions between virus genomes and host genomes. We will combine the results from RaFAH and BLAST to
compare the two methods and see how well they agree.

> ## Exercise - Use RaFAH to predict hosts for our contigs
> RaFAH requires a single file for each contig to run. You first have to write a python
> script which separates the combined assembly into single files. You can use the package
Expand All @@ -39,14 +51,15 @@ forest model. This model matches the set of proteins with host proteins it was t
> SeqIO.write([record], fout, "fasta")
> ~~~
> {: .language-python}
>
>
> You can run this code in the same sbatch script as RaFAH. Here you can find a [description of the parameters](https://gensoft.pasteur.fr/docs/RaFAH/0.3/)
> you can pass to the tool (the page is a bit hard to read). RaFAH is programmed in perl and you can find it here on draco:
>
> ~~~
> # set some variables for running RaFAH on draco
> export PATH=/home/groups/VEO/tools/perl/build/perl-5.32.1/perl-5.32.1:$PATH
> export PERL5LIB=/home/groups/VEO/tools/perl/build/perl-5.32.1/perl-5.32 &&
> # activate the conda environment with the dependencies RaFAH requires
> source /vast/groups/VEO/tools/miniconda3_2024/etc/profile.d/conda.sh && conda activate perl_v5.32.1
>
> # set a variable to call the RaFAH script
> rafah='/home/groups/VEO/tools/rafah/RaFAH.pl'
>
> # create an output folder or use the one you set for the slurm logs:
Expand Down

0 comments on commit 5b87b13

Please sign in to comment.