Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple H12 traces #595

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

Multiple H12 traces #595

wants to merge 40 commits into from

Conversation

jonbrenas
Copy link
Collaborator

@jonbrenas jonbrenas commented Sep 5, 2024

This PR adds new functions for performing H12 GWSS for multiple cohorts and plotting the results together in a single figure. Includes a plot_h12_gwss_multi_overlaid() function to plot multiple cohorts overlaid in a single track, and a plot_h12_gwss_multi_panel() function to plot multiple cohorts in separate tracks with linked pan and zoom.

@leehart
Copy link
Collaborator

leehart commented Sep 19, 2024

Thanks @jonbrenas . I'll convert this to draft because it's still "work in progress". Press the "ready for review" button when it's ready, or feel free to ping reviewers.

@leehart leehart marked this pull request as draft September 19, 2024 08:40
@jonbrenas
Copy link
Collaborator Author

I still think that I may have forgotten something but the one thing I remembered I wanted to do (use a palette for colours by default) I had already done.

@jonbrenas jonbrenas marked this pull request as ready for review October 4, 2024 10:00
@leehart
Copy link
Collaborator

leehart commented Oct 11, 2024

Thanks @jonbrenas . Do you have some example code (or maybe screenshots) to help me understand this better, then I can try to give it a quick review?

@jonbrenas
Copy link
Collaborator Author

jonbrenas commented Oct 11, 2024

Thanks @leehart. Here is the code that I showed previously. It should be pretty close to what ended up in the files.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jonbrenas, looking good, a few suggestions...

def plot_h12_gwss_track_multi(
self,
contig: base_params.contig,
sample_queries: h12_params.sample_queries,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample_queries: h12_params.sample_queries,
cohorts: base_params.cohorts,

Suggest to use a "cohorts" parameter instead, which is the same as already used elsewhere in the API when multiple cohorts are required.

This could then be handled via the _setup_cohort_queries() helper function, like is done within the diversity_stats() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also suggest to add a sample_query parameter, with behaviour as per diversity_stats(). I.e., the sample_query defines a base query, then the cohorts defines groups of samples on top of that. Should all be handled via the _setup_cohort_queries() helper.

line_width=1,
line_color=colors[i % len(colors)],
fill_color=None,
legend_label=sample_queries[i],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use cohort identifiers here rather than sample queries, would be more compact.

def plot_h12_gwss_multi(
self,
contig: base_params.contig,
sample_queries: h12_params.sample_queries,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to change to a cohorts parameter as above.

@doc(
summary="Plot h12 GWSS data with multiple tracks.",
)
def plot_h12_gwss_multi(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def plot_h12_gwss_multi(
def plot_h12_gwss_multi_panel(

Maybe make name more descriptive?

@doc(
summary="Plot h12 GWSS data with multiple traces.",
)
def plot_h12_gwss_multitraces(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def plot_h12_gwss_multitraces(
def plot_h12_gwss_multi_overlay(

Maybe make name more descriptive?

@doc(
summary="Plot h12 GWSS data track with multiple traces.",
)
def plot_h12_gwss_track_multi(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def plot_h12_gwss_track_multi(
def plot_h12_gwss_multi_overlay_track(

Comment on lines 32 to 64
class sample_query_params:
def __init__(
self,
sample_query: base_params.sample_query,
title: Optional[gplt_params.title],
window_size: window_size,
analysis: Optional[hap_params.analysis] = base_params.DEFAULT,
cohort_size: Optional[base_params.cohort_size] = cohort_size_default,
min_cohort_size: Optional[
base_params.min_cohort_size
] = min_cohort_size_default,
max_cohort_size: Optional[
base_params.max_cohort_size
] = max_cohort_size_default,
) -> None:
self.sample_query = sample_query
if title:
self.title = title
else:
self.title = sample_query
self.window_size = window_size
self.analysis = analysis
self.cohort_size = cohort_size
self.min_cohort_size = min_cohort_size
self.max_cohort_size = max_cohort_size


sample_queries: TypeAlias = Annotated[
Sequence[sample_query_params],
"""
A set of sample queries parameters. These include actual sample queries, the title associated to each one, the window size for the analysis, the site filter analysis that needs to be used, the cohort size, the minimum and the maximum cohort sizes.
""",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified, because there is only one parameter here that might need to be varied between cohorts, which is the window_size parameter.

I think I would suggest to remove this, and just expose these parameters (window_size, analysis, cohort_size, min_cohort_size, max_cohort_size) directly on the function being called by the user. For analysis, cohort_size, min_cohort_size, max_cohort_size we would expect a single value and pass this through for all cohorts. For window_size it could be a single value, or it could be a dictionary mapping cohort keys to window size values.

Does that make sense?

So, e.g., a user would call:

ag3.plot_h12_gwss_multi_panel(
    contig="2R",
    sample_sets=[...],
    cohorts="admin1_year",
    analysis="gamb_colu_arab",
    window_size=2000,
    min_cohort_size=20,
    ...
)

...or if different window sizes are required, a user would call:

ag3.plot_h12_gwss_multi_panel(
    contig="2R",
    sample_sets=[...],
    cohorts="admin1_year",
    analysis="gamb_colu_arab",
    window_size={"...": 2000, "...": 1000, ...},
    min_cohort_size=20,
    ...
)

@jonbrenas
Copy link
Collaborator Author

Thanks @alimanfoo . I think I addressed all of your comments and I added some tests. I made the definition of h12_params.window_size a little more complex by including the option for a dict of window sizes which is not really ideal because other functions don't check that the value is an int. That probably needs to be done.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @jonbrenas. A couple of very minor suggestions.

malariagen_data/anoph/h12.py Outdated Show resolved Hide resolved
malariagen_data/anoph/h12.py Outdated Show resolved Hide resolved
malariagen_data/anoph/h12.py Outdated Show resolved Hide resolved
malariagen_data/anoph/h12.py Outdated Show resolved Hide resolved
malariagen_data/anoph/h12_params.py Outdated Show resolved Hide resolved
@alimanfoo
Copy link
Member

It works :)

image

@alimanfoo
Copy link
Member

Multi-panel also works... :)

image

Would be good to remove the X axis label and ticks from all H12 tracks.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@alimanfoo
Copy link
Member

Hi @jonbrenas, took the liberty to push a small change to the plot_h12_h1x notebook adding a couple of examples to exercise the new h12 functions.

@jonbrenas
Copy link
Collaborator Author

Great! Thank you @alimanfoo.

@@ -83,6 +83,14 @@
"A bokeh figure (only returned if show=False).",
]

def_figure: TypeAlias = Annotated[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should rename this parameter to "figure" and rename figure to "optional_figure" just for clarity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea. It is a small change but that will affect a lot of files that have little to do with this PR so I am going to create a new issue (and the associated PR if I have time before I leave for Scotland).

malariagen_data/anoph/h12.py Outdated Show resolved Hide resolved
malariagen_data/anoph/h12.py Show resolved Hide resolved
malariagen_data/anoph/h12.py Show resolved Hide resolved
Comment on lines 151 to 156
h12_params.update({"cohorts": {"all": "year > 0"}})

fig = api.plot_h12_gwss_multi_overlay(**h12_params, show=False)
assert isinstance(fig, bokeh.models.GridPlot)
fig = api.plot_h12_gwss_multi_panel(**h12_params, show=False)
assert isinstance(fig, bokeh.models.GridPlot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to test with more than one cohort.

Also would be good to test both with a single window size (int) and with multiple window sizes (dict).

@alimanfoo
Copy link
Member

Hi @jonbrenas, just pushed a commit with an addition to the example notebook to exercise the functions using a dict value for the cohorts and window_size parameters.

@jonbrenas
Copy link
Collaborator Author

Thanks @alimanfoo. There should now be tests with the dict version of window_size and the tests for the multi versions (which are now their own tests instead of being tacked on at the end of the mono version) now use 2 cohorts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple H12 traces
3 participants