-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update medaka in artic #250
Conversation
nextflow run replikation/poreCov -r update-medaka-in-artic -profile slurm,singularity --krakendb GRCh38.p13_SC2_2022-03-01.tar.gz --cachedir /singularity/ --update --fastq_pass fastq_pass --samples samplesheet.csv --primerV V4.1 --medaka_model r941_min_hac_g507 -resume [87/93c492] process > artic_ncov_wf:artic_medaka (17) [100%] 24 of 24 ✔ 🤯 |
Works for me also w/ Unfortunately, I can not test this branch w/ Then we could compare here the results between
|
I injected the matching primer scheme via
and it looks really good. I get, from a birds eye view, the same mutations as with the old ARTIC container but we're using now medaka 1.7.2 and have access to the new R1041 etc.. models. So comparing R9 model and now R10 w/ new container via md5 checksums of the final genome FASTAS I only see differences for four samples in this run. For one, the new model repaired a frame shift - perfect. For the other three we will look via pairwise whole genome aln and then report here |
Update: for the three sequences w/ different md5 checksum (V5 primers, R9 in old container vs R10 in new container), we see that the difference is only a single base. Either the old container w/ R9 calls an N at a single position instead of the reference/alternative base compared to the new container w/ R10 or vice versa: The below sub-image shows the mapped reads. Apparently, an A should be called instead of the reference G. The old container with R9 model does this, apparently correctly. In contrast, the new container with R10 model seems undecided and inserts an N at the end of the ARTIC workflow. For the top sub-image it's another position in the genome and the other way around. Old container and R9 calls an N, while new container and R10 calls a base. |
These are the only differences we discovered so far. Quite minor in my eyes. And bc/ we basecalled this run w/ R10.4.1* model, it makes also fully sense to me to use that model in Medaka. Even though there are slight differences that "look better" with an older model such as R9. However, basecalling w/ R10 model and then analyses with R9 model is also not really justifiable. |
@DataSpott if everything is fine on your end we can merge |
Tested it with one of our routine runs (starting from fastq-pass) and could only find in one sample a difference. this was only one more ambiguous base with the old container compared to the new container (1112 vs 1113 ambiguous bases). So on my end the run worked fine. |
Solves #247
WIP - tests need to be run.