Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update medaka in artic #250

Merged
merged 4 commits into from
Feb 23, 2023
Merged

Update medaka in artic #250

merged 4 commits into from
Feb 23, 2023

Conversation

hoelzer
Copy link
Collaborator

@hoelzer hoelzer commented Feb 21, 2023

Solves #247

WIP - tests need to be run.

@hoelzer hoelzer marked this pull request as draft February 21, 2023 23:06
@hoelzer
Copy link
Collaborator Author

hoelzer commented Feb 22, 2023

nextflow run replikation/poreCov -r update-medaka-in-artic -profile slurm,singularity --krakendb GRCh38.p13_SC2_2022-03-01.tar.gz --cachedir /singularity/ --update --fastq_pass fastq_pass --samples samplesheet.csv --primerV V4.1 --medaka_model r941_min_hac_g507 -resume
[87/93c492] process > artic_ncov_wf:artic_medaka (17)                      [100%] 24 of 24 ✔

🤯

@hoelzer
Copy link
Collaborator Author

hoelzer commented Feb 22, 2023

Works for me also w/ --medaka_model r1041_e82_400bps_sup_g615

Unfortunately, I can not test this branch w/ --primerV V5.3.2_400 - can I somehow merge that in here @replikation @DataSpott ?

Then we could compare here the results between

  1. --primerV V5.3.2_400 w/ old ARTIC container and medaka old model
  2. --primerV V5.3.2_400 w/ new ARTIC container and medaka model r1041_e82_400bps_sup_g615

@hoelzer
Copy link
Collaborator Author

hoelzer commented Feb 22, 2023

I injected the matching primer scheme via

--primerV poreCov/data/external_primer_schemes/nCoV-2019/V5.3.2_400/nCoV-2019.primer.bed

and it looks really good.

I get, from a birds eye view, the same mutations as with the old ARTIC container but we're using now medaka 1.7.2 and have access to the new R1041 etc.. models.

So comparing R9 model and now R10 w/ new container via md5 checksums of the final genome FASTAS I only see differences for four samples in this run. For one, the new model repaired a frame shift - perfect. For the other three we will look via pairwise whole genome aln and then report here

@hoelzer
Copy link
Collaborator Author

hoelzer commented Feb 22, 2023

Update: for the three sequences w/ different md5 checksum (V5 primers, R9 in old container vs R10 in new container), we see that the difference is only a single base.

Either the old container w/ R9 calls an N at a single position instead of the reference/alternative base compared to the new container w/ R10 or vice versa:

image

The below sub-image shows the mapped reads. Apparently, an A should be called instead of the reference G. The old container with R9 model does this, apparently correctly. In contrast, the new container with R10 model seems undecided and inserts an N at the end of the ARTIC workflow.

For the top sub-image it's another position in the genome and the other way around. Old container and R9 calls an N, while new container and R10 calls a base.

@hoelzer
Copy link
Collaborator Author

hoelzer commented Feb 22, 2023

These are the only differences we discovered so far. Quite minor in my eyes. And bc/ we basecalled this run w/ R10.4.1* model, it makes also fully sense to me to use that model in Medaka. Even though there are slight differences that "look better" with an older model such as R9. However, basecalling w/ R10 model and then analyses with R9 model is also not really justifiable.

@hoelzer hoelzer marked this pull request as ready for review February 22, 2023 15:49
@hoelzer hoelzer requested a review from replikation February 22, 2023 15:49
@replikation
Copy link
Owner

@DataSpott if everything is fine on your end we can merge

@DataSpott
Copy link
Collaborator

Tested it with one of our routine runs (starting from fastq-pass) and could only find in one sample a difference. this was only one more ambiguous base with the old container compared to the new container (1112 vs 1113 ambiguous bases). So on my end the run worked fine.
Tested with commit "db05fa8".

@hoelzer hoelzer merged commit 08565ae into master Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants