Multiple H12 traces #595

jonbrenas · 2024-09-05T09:04:34Z

This PR adds new functions for performing H12 GWSS for multiple cohorts and plotting the results together in a single figure. Includes a plot_h12_gwss_multi_overlaid() function to plot multiple cohorts overlaid in a single track, and a plot_h12_gwss_multi_panel() function to plot multiple cohorts in separate tracks with linked pan and zoom.

Resolves Multiple H12 traces #594.

… am not too happy about.

leehart · 2024-09-19T08:40:06Z

Thanks @jonbrenas . I'll convert this to draft because it's still "work in progress". Press the "ready for review" button when it's ready, or feel free to ping reviewers.

…en-data-python into 594-multi-traces-H12

jonbrenas · 2024-10-04T10:00:00Z

I still think that I may have forgotten something but the one thing I remembered I wanted to do (use a palette for colours by default) I had already done.

leehart · 2024-10-11T09:28:51Z

Thanks @jonbrenas . Do you have some example code (or maybe screenshots) to help me understand this better, then I can try to give it a quick review?

jonbrenas · 2024-10-11T10:23:27Z

Thanks @leehart. Here is the code that I showed previously. It should be pretty close to what ended up in the files.

alimanfoo

Hi @jonbrenas, looking good, a few suggestions...

alimanfoo · 2024-10-15T15:01:02Z

malariagen_data/anoph/h12.py

+    def plot_h12_gwss_track_multi(
+        self,
+        contig: base_params.contig,
+        sample_queries: h12_params.sample_queries,


Suggested change

sample_queries: h12_params.sample_queries,

cohorts: base_params.cohorts,

Suggest to use a "cohorts" parameter instead, which is the same as already used elsewhere in the API when multiple cohorts are required.

This could then be handled via the _setup_cohort_queries() helper function, like is done within the diversity_stats() method.

Also suggest to add a sample_query parameter, with behaviour as per diversity_stats(). I.e., the sample_query defines a base query, then the cohorts defines groups of samples on top of that. Should all be handled via the _setup_cohort_queries() helper.

alimanfoo · 2024-10-15T15:03:21Z

malariagen_data/anoph/h12.py

+                line_width=1,
+                line_color=colors[i % len(colors)],
+                fill_color=None,
+                legend_label=sample_queries[i],


Could use cohort identifiers here rather than sample queries, would be more compact.

alimanfoo · 2024-10-15T15:07:20Z

malariagen_data/anoph/h12.py

+    def plot_h12_gwss_multi(
+        self,
+        contig: base_params.contig,
+        sample_queries: h12_params.sample_queries,


Suggest to change to a cohorts parameter as above.

alimanfoo · 2024-10-15T15:08:42Z

malariagen_data/anoph/h12.py

+    @doc(
+        summary="Plot h12 GWSS data with multiple tracks.",
+    )
+    def plot_h12_gwss_multi(


Suggested change

def plot_h12_gwss_multi(

def plot_h12_gwss_multi_panel(

Maybe make name more descriptive?

alimanfoo · 2024-10-15T15:09:07Z

malariagen_data/anoph/h12.py

+    @doc(
+        summary="Plot h12 GWSS data with multiple traces.",
+    )
+    def plot_h12_gwss_multitraces(


Suggested change

def plot_h12_gwss_multitraces(

def plot_h12_gwss_multi_overlay(

Maybe make name more descriptive?

alimanfoo · 2024-10-15T15:09:30Z

malariagen_data/anoph/h12.py

+    @doc(
+        summary="Plot h12 GWSS data track with multiple traces.",
+    )
+    def plot_h12_gwss_track_multi(


Suggested change

def plot_h12_gwss_track_multi(

def plot_h12_gwss_multi_overlay_track(

alimanfoo · 2024-10-15T15:20:32Z

malariagen_data/anoph/h12_params.py

+class sample_query_params:
+    def __init__(
+        self,
+        sample_query: base_params.sample_query,
+        title: Optional[gplt_params.title],
+        window_size: window_size,
+        analysis: Optional[hap_params.analysis] = base_params.DEFAULT,
+        cohort_size: Optional[base_params.cohort_size] = cohort_size_default,
+        min_cohort_size: Optional[
+            base_params.min_cohort_size
+        ] = min_cohort_size_default,
+        max_cohort_size: Optional[
+            base_params.max_cohort_size
+        ] = max_cohort_size_default,
+    ) -> None:
+        self.sample_query = sample_query
+        if title:
+            self.title = title
+        else:
+            self.title = sample_query
+        self.window_size = window_size
+        self.analysis = analysis
+        self.cohort_size = cohort_size
+        self.min_cohort_size = min_cohort_size
+        self.max_cohort_size = max_cohort_size
+
+
+sample_queries: TypeAlias = Annotated[
+    Sequence[sample_query_params],
+    """
+    A set of sample queries parameters. These include actual sample queries, the title associated to each one, the window size for the analysis, the site filter analysis that needs to be used, the cohort size, the minimum and the maximum cohort sizes.
+    """,
+]


I think this could be simplified, because there is only one parameter here that might need to be varied between cohorts, which is the window_size parameter.

I think I would suggest to remove this, and just expose these parameters (window_size, analysis, cohort_size, min_cohort_size, max_cohort_size) directly on the function being called by the user. For analysis, cohort_size, min_cohort_size, max_cohort_size we would expect a single value and pass this through for all cohorts. For window_size it could be a single value, or it could be a dictionary mapping cohort keys to window size values.

Does that make sense?

So, e.g., a user would call:

ag3.plot_h12_gwss_multi_panel( contig="2R", sample_sets=[...], cohorts="admin1_year", analysis="gamb_colu_arab", window_size=2000, min_cohort_size=20, ... )

...or if different window sizes are required, a user would call:

ag3.plot_h12_gwss_multi_panel( contig="2R", sample_sets=[...], cohorts="admin1_year", analysis="gamb_colu_arab", window_size={"...": 2000, "...": 1000, ...}, min_cohort_size=20, ... )

…better way to handle titles.

jonbrenas · 2024-10-16T08:52:04Z

Thanks @alimanfoo . I think I addressed all of your comments and I added some tests. I made the definition of h12_params.window_size a little more complex by including the option for a dict of window sizes which is not really ideal because other functions don't check that the value is an int. That probably needs to be done.

…en-data-python into 594-multi-traces-H12

alimanfoo

Looking good @jonbrenas. A couple of very minor suggestions.

malariagen_data/anoph/h12.py

malariagen_data/anoph/h12_params.py

alimanfoo · 2024-10-16T17:53:11Z

It works :)

alimanfoo · 2024-10-16T17:55:43Z

Multi-panel also works... :)

Would be good to remove the X axis label and ticks from all H12 tracks.

review-notebook-app · 2024-10-16T17:57:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

alimanfoo · 2024-10-16T17:57:29Z

Hi @jonbrenas, took the liberty to push a small change to the plot_h12_h1x notebook adding a couple of examples to exercise the new h12 functions.

jonbrenas · 2024-10-16T18:09:12Z

Great! Thank you @alimanfoo.

Co-authored-by: Alistair Miles <[email protected]>

alimanfoo · 2024-10-18T10:53:02Z

malariagen_data/anoph/gplt_params.py

@@ -83,6 +83,14 @@
    "A bokeh figure (only returned if show=False).",
 ]

+def_figure: TypeAlias = Annotated[


Wonder if we should rename this parameter to "figure" and rename figure to "optional_figure" just for clarity?

I think it is a good idea. It is a small change but that will affect a lot of files that have little to do with this PR so I am going to create a new issue (and the associated PR if I have time before I leave for Scotland).

malariagen_data/anoph/h12.py

alimanfoo · 2024-10-18T11:11:48Z

tests/anoph/test_h12.py

+    h12_params.update({"cohorts": {"all": "year > 0"}})
+
+    fig = api.plot_h12_gwss_multi_overlay(**h12_params, show=False)
+    assert isinstance(fig, bokeh.models.GridPlot)
+    fig = api.plot_h12_gwss_multi_panel(**h12_params, show=False)
+    assert isinstance(fig, bokeh.models.GridPlot)


Maybe good to test with more than one cohort.

Also would be good to test both with a single window size (int) and with multiple window sizes (dict).

alimanfoo · 2024-10-18T11:13:13Z

Hi @jonbrenas, just pushed a commit with an addition to the example notebook to exercise the functions using a dict value for the cohorts and window_size parameters.

Co-authored-by: Alistair Miles <[email protected]>

jonbrenas · 2024-10-19T15:32:31Z

Thanks @alimanfoo. There should now be tests with the dict version of window_size and the tests for the multi versions (which are now their own tests instead of being tacked on at the end of the mono version) now use 2 cohorts.

jonbrenas added 7 commits September 5, 2024 10:02

First draft for multiple traces of H12. There are still things that I…

7c3c6e8

… am not too happy about.

Forgot one of the shared params.

871380e

Solving linting issues

34d9e69

Solving more linting issues

d3fb68e

Trying to get around assertions.

f46e516

Trying to get around linting.

de8000a

Trying to get around linting.

7b727d4

leehart requested a review from alimanfoo September 16, 2024 10:31

jonbrenas added 5 commits September 17, 2024 17:43

Trying to make the params for the multiple tracks make sense

9cc4afa

Merge branch 'master' into 594-multi-traces-H12

b52d62f

Corrected a typo

8c58708

More arguments moved to sample queries

62728a1

Forgot a few selves

371fef7

leehart marked this pull request as draft September 19, 2024 08:40

jonbrenas added 3 commits September 20, 2024 17:03

Merge branch 'master' into 594-multi-traces-H12

95ae4a4

Started working on the docs

70ddba9

Merge branch '594-multi-traces-H12' of github.com:malariagen/malariag…

00b8266

…en-data-python into 594-multi-traces-H12

jonbrenas marked this pull request as ready for review October 4, 2024 10:00

Merge branch 'master' into 594-multi-traces-H12

d2973e1

alimanfoo reviewed Oct 15, 2024

View reviewed changes

jonbrenas added 4 commits October 15, 2024 22:51

Followed Alistair's advices (I think). Still need to add tests and a …

f067bbb

…better way to handle titles.

Messed up the type definition of window_sizes

68fd445

Added some tests and made some little changes

0a89f77

Merge branch 'master' into 594-multi-traces-H12

e26173e

Improved the def of window_size

3092b8e

Merge branch '594-multi-traces-H12' of github.com:malariagen/malariag…

b466bfe

…en-data-python into 594-multi-traces-H12

alimanfoo reviewed Oct 16, 2024

View reviewed changes

add examples of multi plots

0dedc14

jonbrenas and others added 9 commits October 17, 2024 14:05

Update malariagen_data/anoph/h12.py

c273c1d

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12.py

4625d8a

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12.py

d92b4d0

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12.py

4eb5210

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12_params.py

f1441bd

Co-authored-by: Alistair Miles <[email protected]>

Hoping a tab might solve the issue.

01683a3

Merge branch 'master' into 594-multi-traces-H12

4ee71c6

Merge branch 'master' into 594-multi-traces-H12

1181c7a

add example using cohorts and window_size as dict

bf00824

alimanfoo reviewed Oct 18, 2024

View reviewed changes

jonbrenas and others added 8 commits October 18, 2024 12:41

Update malariagen_data/anoph/h12.py

0fe4b10

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12.py

b4111b6

Co-authored-by: Alistair Miles <[email protected]>

Update malariagen_data/anoph/h12.py

a276ada

Co-authored-by: Alistair Miles <[email protected]>

Update h12.py

44ecb47

Merge branch 'master' into 594-multi-traces-H12

88541a1

More cohorts for the H12 tests

edf74c2

Reorganized the tests

8d2b4c2

Added a test for the dict version of window_size

4fb0ba1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple H12 traces #595

Multiple H12 traces #595

jonbrenas commented Sep 5, 2024 •

edited by alimanfoo

Loading

leehart commented Sep 19, 2024 •

edited

Loading

jonbrenas commented Oct 4, 2024

leehart commented Oct 11, 2024

jonbrenas commented Oct 11, 2024 •

edited

Loading

alimanfoo left a comment

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

alimanfoo Oct 15, 2024

jonbrenas commented Oct 16, 2024

alimanfoo left a comment

alimanfoo commented Oct 16, 2024

alimanfoo commented Oct 16, 2024

review-notebook-app bot commented Oct 16, 2024

alimanfoo commented Oct 16, 2024

jonbrenas commented Oct 16, 2024

alimanfoo Oct 18, 2024

jonbrenas Oct 19, 2024

alimanfoo Oct 18, 2024

alimanfoo commented Oct 18, 2024

jonbrenas commented Oct 19, 2024

	sample_queries: h12_params.sample_queries,
	cohorts: base_params.cohorts,

	def plot_h12_gwss_multitraces(
	def plot_h12_gwss_multi_overlay(

	def plot_h12_gwss_track_multi(
	def plot_h12_gwss_multi_overlay_track(

Multiple H12 traces #595

Are you sure you want to change the base?

Multiple H12 traces #595

Conversation

jonbrenas commented Sep 5, 2024 • edited by alimanfoo Loading

leehart commented Sep 19, 2024 • edited Loading

jonbrenas commented Oct 4, 2024

leehart commented Oct 11, 2024

jonbrenas commented Oct 11, 2024 • edited Loading

alimanfoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonbrenas commented Oct 16, 2024

alimanfoo left a comment

Choose a reason for hiding this comment

alimanfoo commented Oct 16, 2024

alimanfoo commented Oct 16, 2024

review-notebook-app bot commented Oct 16, 2024

alimanfoo commented Oct 16, 2024

jonbrenas commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimanfoo commented Oct 18, 2024

jonbrenas commented Oct 19, 2024

jonbrenas commented Sep 5, 2024 •

edited by alimanfoo

Loading

leehart commented Sep 19, 2024 •

edited

Loading

jonbrenas commented Oct 11, 2024 •

edited

Loading