Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include smoothed count in cvd table #501

Merged
merged 6 commits into from
Mar 10, 2024
Merged

Conversation

Yaoyx
Copy link
Collaborator

@Yaoyx Yaoyx commented Feb 15, 2024

No description provided.

@gfudenberg
Copy link
Member

gfudenberg commented Feb 15, 2024

this would close #456 & is also used in new downsampling analysis for P(s) curves (so that downsampled coolers have same number of pixels) for the docs (open2c/open2c_examples#30).

This PR would make expected_cis() return "count.avg.smoothed" whenever smoothed=True and smoothed_agg=True and use the total number of pixels per diagonal (or, equivalently, no NaNs in the bintable). The new cvd_defaults seem well-named to me.

Only two things I can think to add:

  • some test in "test_expected_cis" that verifies output columns.
  • update docstring to reflect the fact the function does smoothing, and what outputs the function should return
  • update the docstring Returns to clarify what columns to expect in the output. Right now it mentions: region1, region2, diag, n_valid, count.sum count.avg, etc

One question:

  • should output cvd dataFrame return "count.sum" and "balanced.sum"? I'm leaning towards dropping these as they can be re-capitulated with "n_valid" & "balanced.avg", for example.
    Any other suggestions, especially @sergpolly @golobor ?

@gfudenberg
Copy link
Member

proposal from March 4 discussion:

update default table returned by expected_cis as follows:

  • return "dist_bp" by default so that this isn't needed for the visualization
  • prepend the "most processed" given the arguments column as a duplicated column named "contact_freq". i.e. if balanced & smoothing then this will return the balanced.avg.smooth.agg. If aggregated+smoothed, then balanced.avg.smooth.agg. If nothing then count.avg.
  • default return would then be: region1, region2, dist, dist_bp, contact_frequency, all-other-columns-that-are-generated

for the sake of de-bugging / provenance we would not longer drop columns in the internal functions (ie. modify:

.)

describe the role of the keys in the cols keyword argument into a new Notes section for agg_smooth_cvd & what columns they correspond to:

add a note next to DEFAULT_CVD_COLS to check the docstring for agg_smooth_cvd:

…for column names in cvd table to expected_cis() and agg_smooth_cvd() 3. Add dist_bp and contact_freq columns to cvd table and keep all other columns
tests/test_expected.py Outdated Show resolved Hide resolved
@Yaoyx
Copy link
Collaborator Author

Yaoyx commented Mar 8, 2024

This PR enables smoothing for raw counts (counts.avg) in contacts-versus-distance (cvd) DataFrames .
changes:

  • add dist_bp and contact_freq columns to cvd table and keep all other columns
  • change n_elem to n_total for readability
  • use n_total to calculate count.avg
  • use a new set of columns when clr_weight_name=None with dictionary of DEFAULT_RAW_CVD_COLS
  • output count.avg.smoothed and count.avg.smoothed.agg, when clr_weight_name=None, smooth=True, aggregate_smoothed=True
  • agg_smooth_cvd split into per_region_smooth_cvd and genomewide_smooth_cvd() for code readability, and migrated into expected.py and
  • default dictionaries moved to expected.py so that expected_smoothing is agnostic to column names
  • update docstrings of expected_cis(), per_region_smooth_cvd(), and genomewide_smooth_cvd()
  • add a test to check output columns of expected_cis()

@gfudenberg
Copy link
Member

this looks great @Yaoyx !
Will go ahead & merge tomorrow unless @golobor @nvictus or other folks have any last minute suggestions!

@gfudenberg gfudenberg merged commit 0428f30 into open2c:master Mar 10, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants