From fdc874ab1c21904db79a2708e07f40de4890bcd0 Mon Sep 17 00:00:00 2001 From: Katherine Eaton Date: Wed, 22 Feb 2023 16:32:49 -0600 Subject: [PATCH] docs: add test summary package for v0.7.0 --- docs/sphinx/source/update.md | 3 +- .../ncov-recombinant_v0.6.1_v0.7.0.html | 3637 +++++++++++++++++ 2 files changed, 3639 insertions(+), 1 deletion(-) create mode 100644 docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html diff --git a/docs/sphinx/source/update.md b/docs/sphinx/source/update.md index 5960f49..6021bfa 100644 --- a/docs/sphinx/source/update.md +++ b/docs/sphinx/source/update.md @@ -20,7 +20,8 @@ python3 scripts/compare_positives.py \ --node-order alphabetical ``` -A comparative report is provided for each major release: +A comparative report is provided for each major or minor release: +- `v0.6.1` → `v0.7.0` : [docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html](https://ktmeaton.github.io/ncov-recombinant/docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html) - `v0.5.1` → `v0.6.0` : [docs/testing_summary_package/ncov-recombinant_v0.5.1_v0.6.0.html](https://ktmeaton.github.io/ncov-recombinant/docs/testing_summary_package/ncov-recombinant_v0.5.1_v0.6.0.html) - `v0.4.2` → `v0.5.0` : [docs/testing_summary_package/ncov-recombinant_v0.4.2_v0.5.0.html](https://ktmeaton.github.io/ncov-recombinant/docs/testing_summary_package/ncov-recombinant_v0.4.2_v0.5.0.html) diff --git a/docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html b/docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html new file mode 100644 index 0000000..b2bfc8f --- /dev/null +++ b/docs/testing_summary_package/ncov-recombinant_v0.6.1_v0.7.0.html @@ -0,0 +1,3637 @@ + + + + + + + + + ncov-recombinant v0.6.1 - v0.7.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

ncov-recombinant v0.6.1 - v0.7.0

+
+
+Test Summary Package +
+

+This report +was automatically generated +on February 22, 2023. +

+ +

Authors

+

+Katherine Eaton +| National Microbiology Laboratory, PHAC +|
+

+

1. Summary

+

The ncov-recombinant update from v0.6.1 to v0.7.0 has 3 major changes.

+

The first change is a nextclade dataset upgrade from 2022-10-27 to 2023-02-01 which adds nomenclature for newly designated recombinants XBH to XBP.

+

The second change is detection of recursive recombinants, XBL and XBN which arose from two separate recombination events between BA.2.75* and XBB*. Currently, recursive recombination is only set to be detected between XBB and VOC circulating in late 2022 and early 2023.

+

The third major change is that all documentation has been migrated to Read The Docs. This includes a detailed Developer’s Guide for those looking to contribute to the project.

+

Between v0.6.1 and v0.7.0, 15.2% of sequences in the controls-gisaid dataset had different detection results. 5.1% of sequences were newly classified (NA → X) and represent lineages not present in the v0.6.1 model. 6.6% of sequences had lineage assignment changes and 3.5% of sequences had sublineage assignment changes as a result of the Nextclade dataset upgrade. 0% of positive controls were dropped (X → NA), indicating no observed loss in sensitivity.

+

ncov-recombinant v0.7.0 is a recommended upgrade for recombinant surveillance to accurately classify the latest recombinant lineages (up to XBP) and to detect recursive recombination (ex. XBL is a recombinant of XBB).

+

For a comprehensive summary of the methodological changes, please see the release notes for v0.7.0

+

2. Purpose

+

Verify that the update of ncov-recombinant pipeline from version 0.6.1 to0.7.0:

+
    +
  1. Maintains specificity for recombinants trained in previous versions.
  2. +
  3. Increases sensitivity for newly designated recombinant sublineages.
  4. +
+
+ +
+

3. Datasets

+

Controls

+

This dataset includes SARS-CoV-2 genomes from GISAID that reflect the known diversity of recombinant sequences to date. These include 572 positive controls (recombinants), representing lineages XA - XBP and 186 negative controls (non-recombinants) selected from the Nextstrain Reference Phylogeny.

+

In total, 758 control sequences were used as input and a strain list is available here.

+

Canada VirusSeq

+

This dataset includes publicly available SARS-CoV-2 genomes from the Canadian VirusSeq Data Portal. Sequences were downloaded on 2023-01-23 and include 441,234 genomes in total.

+

4. Procedure

+

The snakemake pipelines for v0.6.1 and v0.7.0 were run independently on the controls-gisaid and virusseq datasets. Please see the Procedure section of the Supplementary for detailed command-line instructions.

+

5. Results

+

Controls GISAID

+ +
+

Note: Lineage assignments in v0.7.0 are identical to those in pango-designation and are the expected values.

+
+
+
+Figure 1: Comparison of lineage assignments in the controls-gisaid dataset between v0.6.1 and v0.7.0. +
+
+

Canada VirusSeq

+ +
+

Note: Lineage assignments in v0.7.0 are identical to those in pango-designation and are the expected values.

+
+
+
+Figure 2: Comparison of lineage assignments in the controls-gisaid dataset between v0.6.1 and v0.7.0. +
+
+

Changes

+

New Detections

+

New detections (NAX*) result from the following changes in v0.7.0:

+
    +
  1. Nextclade dataset upgrades to include newly designated lineages: XBG, XBK, XBM.

    + + + + + + + + + + + + + + + + + + + + + + + + + +
    Lineage (v0.7.0)Lineage (v0.6.0)Parents
    XBGNABA.2.76*, BA.5.2*
    XBKNABA.5.2*, CJ.1*
    XBMNABA.2.76*, BF.3*
  2. +
+

Lineage Changes

+

Lineage changes result from the following updates in v0.7.0:

+
    +
  1. Nextclade dataset upgrades to include newly designated lineages: XBH, XBJ, XBL, XBN, XBP.

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Lineage (v0.7.0)Lineage (v0.6.0)Parents
    XBHBY.1BA.2.3*, BA.2.75*
    XBJBA.2.3.20BA.2.3*, BA.5.2*
    XBLXBB.1-likeBA.2.75*, XBB*
    XBNXBB-likeBA.2.75*, XBB*
    XBPXBD-likeBA.2.75*,* BA.5*
  2. +
+

Sublineage changes result from the following updates in v0.7.0:

+
    +
  1. Nextclade dataset upgrades to include new sublineages for: XAY and XBB.

    + + + + + + + + + + + + + + + + + + + + + + + + + +
    Lineage (v0.7.0)Lineage (v0.6.0)Parents
    XAY.2XAY, XAY-likeBA.2*, Delta (21J)
    XBB.1XBB.1.1BA.2.10*, BA.2.75*
    XBB.1.5XBB.1, XBB-likeBA.2.10*, BA.2.75*
  2. +
+

Dropped Positives

+

Dropped positives are only observed in the virusseq dataset, and include the unpublished cluster_id hCoV-19/Canada/ON-PHL-22-53186/2022 (N=19, 2022-12-09 to 2023-01-02). In v0.6.1 this was classified as a BA.5.2/BA.5.3 recombinant with breakpoints extremely close to the 5’ termini (Figure 3). The most likely reason this is dropped in v0.7.0 is because the 3 mutations attributed to BA.5.2 are no longer considered diagnostic based on the latest global mutation frequencies.

+
+
+Figure 3: Genomic composition of the dropped positive (hCoV-19/Canada/ON-PHL-22-53186/2022) which is composed of 19 sequences with identical mutation profiles. +
+
+

Acknowledgements

+

The results here are in whole, or in part based upon data hosted at the Canadian VirusSeq Data Portal: https://virusseq-dataportal.ca/. We wish to acknowledge the Canadian Public Health Laboratory Network (CPHLN), Genome Canada and the CanCOGeN VirusSeq Consortium for their contribution to the Portal.

+

Supplementary

+

Procedure

+

Download Data

+
    +
  1. Download the GISAID sequences and metadata in the strains list from GISAID to data/controls-gisaid/.

  2. +
  3. Download the VirusSeq sequences and metadata.

    +
    wget -O virusseq.tar.gz https://singularity.virusseq-dataportal.ca/download/archive/2d9ace2c-0808-475f-bc93-6ad5808581a4
    +tar -xvf virusseq.tar.gz
    +
    +mkdir data/virusseq
    +
    +# Prep metadata
    +csvtk cut -t -f "fasta header name,sample collection date,geo_loc_name (country),geo_loc_name (state/province/territory)" *files-archive*.tsv \
    +    | csvtk rename -t -f "fasta header name" -n "strain" \
    +    | csvtk rename -t -f "sample collection date" -n "date" \
    +    | csvtk rename -t -f "geo_loc_name (country)" -n "country" \
    +    | csvtk rename -t -f "geo_loc_name (state/province/territory)" -n "division" \
    +    > data/virusseq/metadata.tsv
    +
    +# Prep sequences
    +mv *files-archive*.fasta data/virusseq/sequences.fasta
    +
    +# Cleanup
    +rm *files-archive*.tsv
    +rm virusseq.tar.gz
  4. +
+

Version 0.7.0 | 3f3d4438

+

3f3d4438c5af7584f760855edd620ef162fb1b1e

+
    +
  1. Download the pipeline.

    +
    git clone https://github.com/ktmeaton/ncov-recombinant.git 0.7.0
    +cd 0.7.0
    +git checkout v0.7.0
  2. +
  3. Create a version-controlled conda environment.

    +
    # Local
    +mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.7.0
    +
    +# HPC
    +sbatch -J conda-ncov-recombinant-0.7.0 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.7.0"
  4. +
  5. Symlink the controls-gisaid data.

    +
    ln -s ../../../data/controls-gisaid/metadata.tsv data/controls-gisaid/metadata.tsv
    +ln -s ../../../data/controls-gisaid/sequences.fasta data/controls-gisaid/sequences.fasta
  6. +
  7. Symlink the virusseq data.

    +
    ln -s ../../data/virusseq data/virusseq
  8. +
  9. Run the pipeline for controls-gisaid.

    +
    # Local
    +conda activate ncov-recombinant-0.7.0
    +snakemake --profile profiles/controls-gisaid
    +
    +# HPC
    +scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.7.0
  10. +
  11. Run the pipeline for virusseq (must be done as HPC).

    +
    scripts/slurm.sh --profile profiles/virusseq-hpc --conda-env ncov-recombinant-0.7.0
    +
      +
    • Note: The pipeline will likely fail to run *_historical rules (ex. plot_historical, report_historical). This is because new bug fixes were introduced in v0.7.0 to catch errors relating to plotting extremely large datasets. This is tolerable, as for the test summary package, only the linelists are used for reporting here.
    • +
  12. +
+

Version 0.6.1 | 4d1f495a

+
    +
  1. Download the pipeline.

    +
    git clone https://github.com/ktmeaton/ncov-recombinant.git 0.6.1
    +cd 0.6.1
    +git checkout v0.6.1-hotfix.1
  2. +
  3. Create a version-controlled conda environment.

    +
    # Local
    +mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.1
    +
    +# HPC
    +sbatch -J conda-ncov-recombinant-0.6.1 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.1"
  4. +
  5. Symlink the controls-gisaid data.

    +
    ln -s ../../../data/controls-gisaid/metadata.tsv data/controls-gisaid/metadata.tsv
    +ln -s ../../../data/controls-gisaid/sequences.fasta data/controls-gisaid/sequences.fasta
  6. +
  7. Symlink the virusseq data.

    +
    ln -s ../../data/virusseq data/virusseq
  8. +
  9. Run the pipeline for controls-gisaid.

    +
    # Local
    +conda activate ncov-recombinant-0.6.1
    +snakemake --profile profiles/controls-gisaid
    +
    +# HPC
    +scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.6.1
  10. +
  11. Run the pipeline for virusseq (must be done as HPC).

    +
    scripts/slurm.sh --profile profiles/virusseq-hpc --conda-env ncov-recombinant-0.6.1
  12. +
+

Comparison

+

After the pipelines are complete for each version, run the following to compare lineage assignments.

+
old_ver="0.6.1"
+new_ver="0.7.0"
+

Controls GISAID

+
conda activate ncov-recombinant-0.7.0
+
+link_sizes=("1" "3" "5" "10")
+for size in ${link_sizes[@]}; do
+    python3 0.7.0/scripts/compare_positives.py \
+      --positives-1 ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
+      --positives-2 ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
+      --ver-1 "v${old_ver}" \
+      --ver-2 "v${new_ver}" \
+      --outdir compare/controls-gisaid-${size} \
+      --node-order alphabetical \
+      --min-link-size $size
+done
+

Canada VirusSeq

+
conda activate ncov-recombinant-0.7.0
+
+link_sizes=("1" "3" "5" "10")
+for size in ${link_sizes[@]}; do
+    python3 0.7.0/scripts/compare_positives.py \
+      --positives-1 ${old_ver}/results/virusseq/linelists/positives.tsv \
+      --positives-2 ${new_ver}/results/virusseq/linelists/positives.tsv \
+      --ver-1 "v${old_ver}" \
+      --ver-2 "v${new_ver}" \
+      --outdir compare/virusseq-${size} \
+      --node-order alphabetical \
+      --min-link-size $size
+done
+

New Lineages

+
old_ver="0.6.1"
+new_ver="0.7.0"
+csvtk cut -t -f "strain" ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
+  | tail -n+2 \
+  | csvtk grep -t -f "strain" -P - -v ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
+  | csvtk cut -t -f "strain" \
+  | tail -n+2 \
+  | csvtk grep -t -f "strain" -P - ${old_ver}/results/controls-gisaid/linelists/linelist.tsv \
+  | csvtk pretty -t \
+  | less -S
+

Dropped Lineages

+
csvtk cut -t -f "strain" ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
+  | tail -n+2 \
+  | csvtk grep -t -f "strain" -P - -v ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
+  | csvtk cut -t -f "strain" \
+  | tail -n+2 \
+  | csvtk grep -t -f "strain" -P - ${new_ver}/results/controls-gisaid/linelists/linelist.tsv \
+  | csvtk pretty -t \
+  | less -S
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +