-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple H12 traces #595
base: master
Are you sure you want to change the base?
Multiple H12 traces #595
Conversation
… am not too happy about.
Thanks @jonbrenas . I'll convert this to draft because it's still "work in progress". Press the "ready for review" button when it's ready, or feel free to ping reviewers. |
…en-data-python into 594-multi-traces-H12
I still think that I may have forgotten something but the one thing I remembered I wanted to do (use a palette for colours by default) I had already done. |
Thanks @jonbrenas . Do you have some example code (or maybe screenshots) to help me understand this better, then I can try to give it a quick review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jonbrenas, looking good, a few suggestions...
malariagen_data/anoph/h12.py
Outdated
def plot_h12_gwss_track_multi( | ||
self, | ||
contig: base_params.contig, | ||
sample_queries: h12_params.sample_queries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_queries: h12_params.sample_queries, | |
cohorts: base_params.cohorts, |
Suggest to use a "cohorts" parameter instead, which is the same as already used elsewhere in the API when multiple cohorts are required.
This could then be handled via the _setup_cohort_queries()
helper function, like is done within the diversity_stats()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also suggest to add a sample_query
parameter, with behaviour as per diversity_stats()
. I.e., the sample_query
defines a base query, then the cohorts
defines groups of samples on top of that. Should all be handled via the _setup_cohort_queries()
helper.
malariagen_data/anoph/h12.py
Outdated
line_width=1, | ||
line_color=colors[i % len(colors)], | ||
fill_color=None, | ||
legend_label=sample_queries[i], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use cohort identifiers here rather than sample queries, would be more compact.
malariagen_data/anoph/h12.py
Outdated
def plot_h12_gwss_multi( | ||
self, | ||
contig: base_params.contig, | ||
sample_queries: h12_params.sample_queries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change to a cohorts
parameter as above.
malariagen_data/anoph/h12.py
Outdated
@doc( | ||
summary="Plot h12 GWSS data with multiple tracks.", | ||
) | ||
def plot_h12_gwss_multi( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def plot_h12_gwss_multi( | |
def plot_h12_gwss_multi_panel( |
Maybe make name more descriptive?
malariagen_data/anoph/h12.py
Outdated
@doc( | ||
summary="Plot h12 GWSS data with multiple traces.", | ||
) | ||
def plot_h12_gwss_multitraces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def plot_h12_gwss_multitraces( | |
def plot_h12_gwss_multi_overlay( |
Maybe make name more descriptive?
malariagen_data/anoph/h12.py
Outdated
@doc( | ||
summary="Plot h12 GWSS data track with multiple traces.", | ||
) | ||
def plot_h12_gwss_track_multi( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def plot_h12_gwss_track_multi( | |
def plot_h12_gwss_multi_overlay_track( |
malariagen_data/anoph/h12_params.py
Outdated
class sample_query_params: | ||
def __init__( | ||
self, | ||
sample_query: base_params.sample_query, | ||
title: Optional[gplt_params.title], | ||
window_size: window_size, | ||
analysis: Optional[hap_params.analysis] = base_params.DEFAULT, | ||
cohort_size: Optional[base_params.cohort_size] = cohort_size_default, | ||
min_cohort_size: Optional[ | ||
base_params.min_cohort_size | ||
] = min_cohort_size_default, | ||
max_cohort_size: Optional[ | ||
base_params.max_cohort_size | ||
] = max_cohort_size_default, | ||
) -> None: | ||
self.sample_query = sample_query | ||
if title: | ||
self.title = title | ||
else: | ||
self.title = sample_query | ||
self.window_size = window_size | ||
self.analysis = analysis | ||
self.cohort_size = cohort_size | ||
self.min_cohort_size = min_cohort_size | ||
self.max_cohort_size = max_cohort_size | ||
|
||
|
||
sample_queries: TypeAlias = Annotated[ | ||
Sequence[sample_query_params], | ||
""" | ||
A set of sample queries parameters. These include actual sample queries, the title associated to each one, the window size for the analysis, the site filter analysis that needs to be used, the cohort size, the minimum and the maximum cohort sizes. | ||
""", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified, because there is only one parameter here that might need to be varied between cohorts, which is the window_size
parameter.
I think I would suggest to remove this, and just expose these parameters (window_size
, analysis
, cohort_size
, min_cohort_size
, max_cohort_size
) directly on the function being called by the user. For analysis
, cohort_size
, min_cohort_size
, max_cohort_size
we would expect a single value and pass this through for all cohorts. For window_size
it could be a single value, or it could be a dictionary mapping cohort keys to window size values.
Does that make sense?
So, e.g., a user would call:
ag3.plot_h12_gwss_multi_panel(
contig="2R",
sample_sets=[...],
cohorts="admin1_year",
analysis="gamb_colu_arab",
window_size=2000,
min_cohort_size=20,
...
)
...or if different window sizes are required, a user would call:
ag3.plot_h12_gwss_multi_panel(
contig="2R",
sample_sets=[...],
cohorts="admin1_year",
analysis="gamb_colu_arab",
window_size={"...": 2000, "...": 1000, ...},
min_cohort_size=20,
...
)
Thanks @alimanfoo . I think I addressed all of your comments and I added some tests. I made the definition of |
…en-data-python into 594-multi-traces-H12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @jonbrenas. A couple of very minor suggestions.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @jonbrenas, took the liberty to push a small change to the plot_h12_h1x notebook adding a couple of examples to exercise the new h12 functions. |
Great! Thank you @alimanfoo. |
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
@@ -83,6 +83,14 @@ | |||
"A bokeh figure (only returned if show=False).", | |||
] | |||
|
|||
def_figure: TypeAlias = Annotated[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should rename this parameter to "figure" and rename figure to "optional_figure" just for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea. It is a small change but that will affect a lot of files that have little to do with this PR so I am going to create a new issue (and the associated PR if I have time before I leave for Scotland).
tests/anoph/test_h12.py
Outdated
h12_params.update({"cohorts": {"all": "year > 0"}}) | ||
|
||
fig = api.plot_h12_gwss_multi_overlay(**h12_params, show=False) | ||
assert isinstance(fig, bokeh.models.GridPlot) | ||
fig = api.plot_h12_gwss_multi_panel(**h12_params, show=False) | ||
assert isinstance(fig, bokeh.models.GridPlot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to test with more than one cohort.
Also would be good to test both with a single window size (int) and with multiple window sizes (dict).
Hi @jonbrenas, just pushed a commit with an addition to the example notebook to exercise the functions using a dict value for the |
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Thanks @alimanfoo. There should now be tests with the dict version of |
This PR adds new functions for performing H12 GWSS for multiple cohorts and plotting the results together in a single figure. Includes a
plot_h12_gwss_multi_overlaid()
function to plot multiple cohorts overlaid in a single track, and aplot_h12_gwss_multi_panel()
function to plot multiple cohorts in separate tracks with linked pan and zoom.