diff --git a/_episodes/2.1.2_host_prediction_II.md b/_episodes/2.1.2_host_prediction_II.md index c47cfa5..f6a6cac 100644 --- a/_episodes/2.1.2_host_prediction_II.md +++ b/_episodes/2.1.2_host_prediction_II.md @@ -17,6 +17,18 @@ The tool predicts proteins in each viral sequence and then assigns them to a set groups. The annotated proteins are then used to predict a host genus with a pretrained random forest model. This model matches the set of proteins with host proteins it was trained on. +After running RaFAH, we will compare the resulting prediction with the bacterial metagenome you +briefly analyzed in one of last weeks homeworks. The bacterial metagenome was taxonomically annotated +using GTDB. We will get into taxonomic classification of viruses tomorrow, for today it is important +to know that there are different methods of taxonomic classification. RaFAH uses the NCBI taxonomy, +so we will have to translate the output of RaFAH into the GTDB taxonomy to be able to compare the +predictions with our bacterial metagenome. + +In the end of today's practical part, we will use BLAST as an alternative way of linking viruses to +their hosts. With BLAST, we will map our viral contigs to the bacterial metagenome to find homologous +regions between virus genomes and host genomes. We will combine the results from RaFAH and BLAST to +compare the two methods and see how well they agree. + > ## Exercise - Use RaFAH to predict hosts for our contigs > RaFAH requires a single file for each contig to run. You first have to write a python > script which separates the combined assembly into single files. You can use the package @@ -39,14 +51,15 @@ forest model. This model matches the set of proteins with host proteins it was t > SeqIO.write([record], fout, "fasta") > ~~~ > {: .language-python} -> +> > You can run this code in the same sbatch script as RaFAH. Here you can find a [description of the parameters](https://gensoft.pasteur.fr/docs/RaFAH/0.3/) > you can pass to the tool (the page is a bit hard to read). RaFAH is programmed in perl and you can find it here on draco: > > ~~~ -> # set some variables for running RaFAH on draco -> export PATH=/home/groups/VEO/tools/perl/build/perl-5.32.1/perl-5.32.1:$PATH -> export PERL5LIB=/home/groups/VEO/tools/perl/build/perl-5.32.1/perl-5.32 && +> # activate the conda environment with the dependencies RaFAH requires +> source /vast/groups/VEO/tools/miniconda3_2024/etc/profile.d/conda.sh && conda activate perl_v5.32.1 +> +> # set a variable to call the RaFAH script > rafah='/home/groups/VEO/tools/rafah/RaFAH.pl' > > # create an output folder or use the one you set for the slurm logs: