Skip to content

Commit

Permalink
new files
Browse files Browse the repository at this point in the history
  • Loading branch information
srandall02 committed Dec 17, 2023
1 parent 6b567ca commit c03d222
Show file tree
Hide file tree
Showing 9 changed files with 248 additions and 0 deletions.
42 changes: 42 additions & 0 deletions docs/source/all_lineage_prevalences.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
all_lineage_prevalences(location, startswith)
---------------------------------------------

.. autofunction:: outbreak_data.all_lineage_prevalences


Example usage::
#Find the prevalence all lineages in Argentina that begin with 'xbb.1'
df = od.prevalence_by_location("ARG", startswith = 'xbb.1')
print(df)

.. code-block::
:caption: Output
date total_count lineage_count lineage prevalence \
1454 2022-10-12 3 1 xbb.1 0.333333
1455 2022-10-13 0 0 xbb.1 0.000000
1456 2022-10-14 0 0 xbb.1 0.000000
1457 2022-10-15 0 0 xbb.1 0.000000
1458 2022-10-16 0 0 xbb.1 0.000000
... ... ... ... ... ...
1673 2023-03-17 0 0 xbb.1.5 0.000000
1674 2023-03-18 0 0 xbb.1.5 0.000000
1675 2023-03-19 0 0 xbb.1.5 0.000000
1676 2023-03-20 0 0 xbb.1.5 0.000000
1677 2023-03-21 1 1 xbb.1.5 1.000000
prevalence_rolling
1454 0.350000
1455 0.179487
1456 0.109375
1457 0.065421
1458 0.058577
... ...
1673 1.000000
1674 1.000000
1675 1.000000
1676 1.000000
1677 1.000000
[224 rows x 6 columns]
85 changes: 85 additions & 0 deletions docs/source/cryptic_vars.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
Tracing Mutations Back to Lineage
---------------------------------

The Python Outbreak API can be queried in order to determine which lineages a mutation has been found in. After collecting a sample and determining what sequences are present, we may have a list of several SARS-CoV-2 mutations that we can immediately say are characteristic of a specific variant. However in some cases, we also may have a mutation that is relatively uncommon in most other samples.
For example, we can look at small data sample consisting of 10 mutations: (S:A67V, S:DEL69/70, S:E484A, S:N501Y, S:T572N, S:D614G, S:G142D N:S2Y, S:Q52R, E:L21F, S:G593D). We’ll want a way to find more details about any mutation collected, such as whether the mutation has been collected before, when, and where that mutation came from.

To start, the ``mutations_by_lineage()`` function allows us to look at the clinical prevalence of a mutation and see which lineage it most likely belongs to. Let's try it for E:L21F::

# Perform authentication if you haven't already
from outbreak_data import authenticate_user
authenticate_user.authenticate_new_user()

# Import outbreak_data package
from outbreak_data import outbreak_data as od

lin1 = od.mutations_by_lineage(mutation='E:L21F')
print(lin1)

.. code-block::
:caption: Output
pangolin_lineage lineage_count mutation_count proportion \
0 ba.2 1228296 560 0.000456
1 b.1.1.7 1155169 844 0.000731
2 ba.1.1 1046121 268 0.000256
3 ay.4 861521 526 0.000611
4 ba.1 439838 49 0.000111
... ... ... ... ...
400 ba.2.77 63 48 0.761905
401 ba.5.2.54 55 2 0.036364
402 b.1.616 39 3 0.076923
403 b.1.1.386 20 1 0.050000
404 b.1.1.400 20 20 1.000000
proportion_ci_lower proportion_ci_upper
0 0.000419 0.000495
1 0.000683 0.000781
2 0.000227 0.000288
3 0.000560 0.000664
4 0.000083 0.000146
... ... ...
400 0.646596 0.853783
401 0.007632 0.111568
402 0.022142 0.191265
403 0.005449 0.210819
404 0.883361 0.999976
[405 rows x 6 columns]

This mutation has clearly been seen before in some previous lineages. We might be able recognize that most of the mutations in our list have been detected in older variants, as well as Omicron. However, S:G593D is relatively uncommon in most other samples. We can easily find out where and when it was last detected::

>>> lin2 = od.mutations_by_lineage(mutation='S:G593D')
>>> print(lin2)

pangolin_lineage lineage_count mutation_count proportion \
0 xbb.1 28205 1 0.000035

proportion_ci_lower proportion_ci_upper
0 0.000004 0.000166

>>> last_seen = od.collection_date('xbb.1', 'S:G593D')
>>> print(last_seen)

Values
date 2022-12-12
date_count 1

According to our data, S:G593D has only been detected once in a single sequence belonging to the xbb.1 lineage. The last time it was collected was back on December 12, 2022.

Additionally ``mutations_by_lineage`` allows us to find out if there is a lineage where several mutations overlap. Selecting 7 of the mutations from our original list yields one lineage with all of these mutation characteristics::

>>> lin3 = od.mutations_by_lineage(mutation='S:A67V, S:DEL69/70, S:E484A, S:N501Y, S:T572N, S:D614G, S:G142D')
>>> print(lin3)

pangolin_lineage lineage_count mutation_count proportion \
0 ba.1.19 4587 1 0.000218

proportion_ci_lower proportion_ci_upper
0 0.000024 0.001019


Here we see that the only lineage that contains all 7 mutations is ba.1.19.


55 changes: 55 additions & 0 deletions docs/source/global_prevalence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
global_prevalence(pango_lin, mutations, cumulative)
----------------------------------------------------

.. autofunction:: outbreak_data.global_prevalence

Example: Get global info on lineage 'XBB'::

df = outbreak_data.global_prevalence('xbb')
print(df)

.. code-block::
:caption: Output:
date total_count lineage_count total_count_rolling \
0 2021-06-29 15453 2 10772.428571
1 2021-06-30 13101 0 11060.571429
2 2021-07-01 13088 0 11495.000000
3 2021-07-02 11562 0 11890.571429
4 2021-07-03 8310 0 11845.571429
.. ... ... ... ...
713 2023-06-12 27 0 112.428571
714 2023-06-13 8 0 61.714286
715 2023-06-14 1 0 36.000000
716 2023-06-15 1 0 25.285714
717 2023-06-17 1 0 8.000000
lineage_count_rolling proportion proportion_ci_lower \
0 0.285714 0.000027 4.558329e-08
1 0.285714 0.000026 4.439232e-08
2 0.285714 0.000025 4.271630e-08
3 0.285714 0.000024 4.129377e-08
4 0.285714 0.000024 4.145063e-08
.. ... ... ...
713 0.142857 0.001271 4.374452e-06
714 0.000000 0.000000 7.888011e-06
715 0.000000 0.000000 1.354537e-05
716 0.000000 0.000000 1.944577e-05
717 0.000000 0.000000 5.949030e-05
proportion_ci_upper
0 0.000233
1 0.000227
2 0.000218
3 0.000211
4 0.000212
.. ...
713 0.022129
714 0.039548
715 0.066944
716 0.094683
717 0.262217
[718 rows x 8 columns]


4 changes: 4 additions & 0 deletions docs/source/growth_rates.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
growth_rates(lineage, location)
-------------------------------

.. autofunction:: outbreak_data.growth_rates
5 changes: 5 additions & 0 deletions docs/source/location_details.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
location_details(location)
---------------------------

.. autofunction:: outbreak_data.location_details

44 changes: 44 additions & 0 deletions docs/source/mutations_by_lineage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
mutations_by_lineage(mutation, location, pango_lin)
---------------------------------------------------

.. autofunction:: outbreak_data.mutations_by_lineage


Example usage::

#Get info on mutation 'orf1b:p314l'
df = od.mutations_by_lineage('orf1b:p314l')
print(df)

.. code-block::
:caption: Output
pangolin_lineage lineage_count mutation_count proportion \
0 ba.2 1227503 1222717 0.996101
1 b.1.1.7 1154337 1147331 0.993931
2 ba.1.1 1044480 1039813 0.995532
3 ay.4 858839 854935 0.995454
4 ba.1 438947 437207 0.996036
... ... ... ... ...
2851 fn.1 1 1 1.000000
2852 miscba1ba2post5386 1 1 1.000000
2853 xbb.1.23 1 1 1.000000
2854 xbb.1.37 1 1 1.000000
2855 xbv 1 1 1.000000
proportion_ci_lower proportion_ci_upper
0 0.995990 0.996210
1 0.993788 0.994071
2 0.995402 0.995658
3 0.995310 0.995595
4 0.995847 0.996219
... ... ...
2851 0.146746 0.999614
2852 0.146746 0.999614
2853 0.146746 0.999614
2854 0.146746 0.999614
2855 0.146746 0.999614
[2856 rows x 6 columns]


5 changes: 5 additions & 0 deletions docs/source/wildcard_lineage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
wildcard_lineage(name)
-----------------------

.. autofunction:: outbreak_data.wildcard_lineage

4 changes: 4 additions & 0 deletions docs/source/wildcard_location.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
wildcard_location(name)
------------------------

.. autofunction:: outbreak_data.wildcard_location
4 changes: 4 additions & 0 deletions docs/source/wildcard_mutations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
wildcard_mutations(name)
------------------------

.. autofunction:: outbreak_data.wildcard_mutations

0 comments on commit c03d222

Please sign in to comment.