-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #46 from outbreak-info/doc_updates
new files
- Loading branch information
Showing
9 changed files
with
248 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
all_lineage_prevalences(location, startswith) | ||
--------------------------------------------- | ||
|
||
.. autofunction:: outbreak_data.all_lineage_prevalences | ||
|
||
|
||
Example usage:: | ||
#Find the prevalence all lineages in Argentina that begin with 'xbb.1' | ||
df = od.prevalence_by_location("ARG", startswith = 'xbb.1') | ||
print(df) | ||
|
||
.. code-block:: | ||
:caption: Output | ||
date total_count lineage_count lineage prevalence \ | ||
1454 2022-10-12 3 1 xbb.1 0.333333 | ||
1455 2022-10-13 0 0 xbb.1 0.000000 | ||
1456 2022-10-14 0 0 xbb.1 0.000000 | ||
1457 2022-10-15 0 0 xbb.1 0.000000 | ||
1458 2022-10-16 0 0 xbb.1 0.000000 | ||
... ... ... ... ... ... | ||
1673 2023-03-17 0 0 xbb.1.5 0.000000 | ||
1674 2023-03-18 0 0 xbb.1.5 0.000000 | ||
1675 2023-03-19 0 0 xbb.1.5 0.000000 | ||
1676 2023-03-20 0 0 xbb.1.5 0.000000 | ||
1677 2023-03-21 1 1 xbb.1.5 1.000000 | ||
prevalence_rolling | ||
1454 0.350000 | ||
1455 0.179487 | ||
1456 0.109375 | ||
1457 0.065421 | ||
1458 0.058577 | ||
... ... | ||
1673 1.000000 | ||
1674 1.000000 | ||
1675 1.000000 | ||
1676 1.000000 | ||
1677 1.000000 | ||
[224 rows x 6 columns] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
Tracing Mutations Back to Lineage | ||
--------------------------------- | ||
|
||
The Python Outbreak API can be queried in order to determine which lineages a mutation has been found in. After collecting a sample and determining what sequences are present, we may have a list of several SARS-CoV-2 mutations that we can immediately say are characteristic of a specific variant. However in some cases, we also may have a mutation that is relatively uncommon in most other samples. | ||
For example, we can look at small data sample consisting of 10 mutations: (S:A67V, S:DEL69/70, S:E484A, S:N501Y, S:T572N, S:D614G, S:G142D N:S2Y, S:Q52R, E:L21F, S:G593D). We’ll want a way to find more details about any mutation collected, such as whether the mutation has been collected before, when, and where that mutation came from. | ||
|
||
To start, the ``mutations_by_lineage()`` function allows us to look at the clinical prevalence of a mutation and see which lineage it most likely belongs to. Let's try it for E:L21F:: | ||
|
||
# Perform authentication if you haven't already | ||
from outbreak_data import authenticate_user | ||
authenticate_user.authenticate_new_user() | ||
|
||
# Import outbreak_data package | ||
from outbreak_data import outbreak_data as od | ||
|
||
lin1 = od.mutations_by_lineage(mutation='E:L21F') | ||
print(lin1) | ||
|
||
.. code-block:: | ||
:caption: Output | ||
pangolin_lineage lineage_count mutation_count proportion \ | ||
0 ba.2 1228296 560 0.000456 | ||
1 b.1.1.7 1155169 844 0.000731 | ||
2 ba.1.1 1046121 268 0.000256 | ||
3 ay.4 861521 526 0.000611 | ||
4 ba.1 439838 49 0.000111 | ||
... ... ... ... ... | ||
400 ba.2.77 63 48 0.761905 | ||
401 ba.5.2.54 55 2 0.036364 | ||
402 b.1.616 39 3 0.076923 | ||
403 b.1.1.386 20 1 0.050000 | ||
404 b.1.1.400 20 20 1.000000 | ||
proportion_ci_lower proportion_ci_upper | ||
0 0.000419 0.000495 | ||
1 0.000683 0.000781 | ||
2 0.000227 0.000288 | ||
3 0.000560 0.000664 | ||
4 0.000083 0.000146 | ||
... ... ... | ||
400 0.646596 0.853783 | ||
401 0.007632 0.111568 | ||
402 0.022142 0.191265 | ||
403 0.005449 0.210819 | ||
404 0.883361 0.999976 | ||
[405 rows x 6 columns] | ||
|
||
This mutation has clearly been seen before in some previous lineages. We might be able recognize that most of the mutations in our list have been detected in older variants, as well as Omicron. However, S:G593D is relatively uncommon in most other samples. We can easily find out where and when it was last detected:: | ||
|
||
>>> lin2 = od.mutations_by_lineage(mutation='S:G593D') | ||
>>> print(lin2) | ||
|
||
pangolin_lineage lineage_count mutation_count proportion \ | ||
0 xbb.1 28205 1 0.000035 | ||
|
||
proportion_ci_lower proportion_ci_upper | ||
0 0.000004 0.000166 | ||
|
||
>>> last_seen = od.collection_date('xbb.1', 'S:G593D') | ||
>>> print(last_seen) | ||
|
||
Values | ||
date 2022-12-12 | ||
date_count 1 | ||
|
||
According to our data, S:G593D has only been detected once in a single sequence belonging to the xbb.1 lineage. The last time it was collected was back on December 12, 2022. | ||
|
||
Additionally ``mutations_by_lineage`` allows us to find out if there is a lineage where several mutations overlap. Selecting 7 of the mutations from our original list yields one lineage with all of these mutation characteristics:: | ||
|
||
>>> lin3 = od.mutations_by_lineage(mutation='S:A67V, S:DEL69/70, S:E484A, S:N501Y, S:T572N, S:D614G, S:G142D') | ||
>>> print(lin3) | ||
|
||
pangolin_lineage lineage_count mutation_count proportion \ | ||
0 ba.1.19 4587 1 0.000218 | ||
|
||
proportion_ci_lower proportion_ci_upper | ||
0 0.000024 0.001019 | ||
|
||
|
||
Here we see that the only lineage that contains all 7 mutations is ba.1.19. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
global_prevalence(pango_lin, mutations, cumulative) | ||
---------------------------------------------------- | ||
|
||
.. autofunction:: outbreak_data.global_prevalence | ||
|
||
Example: Get global info on lineage 'XBB':: | ||
|
||
df = outbreak_data.global_prevalence('xbb') | ||
print(df) | ||
|
||
.. code-block:: | ||
:caption: Output: | ||
date total_count lineage_count total_count_rolling \ | ||
0 2021-06-29 15453 2 10772.428571 | ||
1 2021-06-30 13101 0 11060.571429 | ||
2 2021-07-01 13088 0 11495.000000 | ||
3 2021-07-02 11562 0 11890.571429 | ||
4 2021-07-03 8310 0 11845.571429 | ||
.. ... ... ... ... | ||
713 2023-06-12 27 0 112.428571 | ||
714 2023-06-13 8 0 61.714286 | ||
715 2023-06-14 1 0 36.000000 | ||
716 2023-06-15 1 0 25.285714 | ||
717 2023-06-17 1 0 8.000000 | ||
lineage_count_rolling proportion proportion_ci_lower \ | ||
0 0.285714 0.000027 4.558329e-08 | ||
1 0.285714 0.000026 4.439232e-08 | ||
2 0.285714 0.000025 4.271630e-08 | ||
3 0.285714 0.000024 4.129377e-08 | ||
4 0.285714 0.000024 4.145063e-08 | ||
.. ... ... ... | ||
713 0.142857 0.001271 4.374452e-06 | ||
714 0.000000 0.000000 7.888011e-06 | ||
715 0.000000 0.000000 1.354537e-05 | ||
716 0.000000 0.000000 1.944577e-05 | ||
717 0.000000 0.000000 5.949030e-05 | ||
proportion_ci_upper | ||
0 0.000233 | ||
1 0.000227 | ||
2 0.000218 | ||
3 0.000211 | ||
4 0.000212 | ||
.. ... | ||
713 0.022129 | ||
714 0.039548 | ||
715 0.066944 | ||
716 0.094683 | ||
717 0.262217 | ||
[718 rows x 8 columns] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
growth_rates(lineage, location) | ||
------------------------------- | ||
|
||
.. autofunction:: outbreak_data.growth_rates |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
location_details(location) | ||
--------------------------- | ||
|
||
.. autofunction:: outbreak_data.location_details | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
mutations_by_lineage(mutation, location, pango_lin) | ||
--------------------------------------------------- | ||
|
||
.. autofunction:: outbreak_data.mutations_by_lineage | ||
|
||
|
||
Example usage:: | ||
|
||
#Get info on mutation 'orf1b:p314l' | ||
df = od.mutations_by_lineage('orf1b:p314l') | ||
print(df) | ||
|
||
.. code-block:: | ||
:caption: Output | ||
pangolin_lineage lineage_count mutation_count proportion \ | ||
0 ba.2 1227503 1222717 0.996101 | ||
1 b.1.1.7 1154337 1147331 0.993931 | ||
2 ba.1.1 1044480 1039813 0.995532 | ||
3 ay.4 858839 854935 0.995454 | ||
4 ba.1 438947 437207 0.996036 | ||
... ... ... ... ... | ||
2851 fn.1 1 1 1.000000 | ||
2852 miscba1ba2post5386 1 1 1.000000 | ||
2853 xbb.1.23 1 1 1.000000 | ||
2854 xbb.1.37 1 1 1.000000 | ||
2855 xbv 1 1 1.000000 | ||
proportion_ci_lower proportion_ci_upper | ||
0 0.995990 0.996210 | ||
1 0.993788 0.994071 | ||
2 0.995402 0.995658 | ||
3 0.995310 0.995595 | ||
4 0.995847 0.996219 | ||
... ... ... | ||
2851 0.146746 0.999614 | ||
2852 0.146746 0.999614 | ||
2853 0.146746 0.999614 | ||
2854 0.146746 0.999614 | ||
2855 0.146746 0.999614 | ||
[2856 rows x 6 columns] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
wildcard_lineage(name) | ||
----------------------- | ||
|
||
.. autofunction:: outbreak_data.wildcard_lineage | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
wildcard_location(name) | ||
------------------------ | ||
|
||
.. autofunction:: outbreak_data.wildcard_location |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
wildcard_mutations(name) | ||
------------------------ | ||
|
||
.. autofunction:: outbreak_data.wildcard_mutations |