Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function for batch ARI calculation #97

Closed
sjspielman opened this issue Aug 17, 2022 · 1 comment
Closed

Function for batch ARI calculation #97

sjspielman opened this issue Aug 17, 2022 · 1 comment

Comments

@sjspielman
Copy link
Member

Closes #86
⚠️ Stacked on #95

This PR adds a function calculate_batch_ari() to calculate batch ARI. I loosely followed the approach in the Genome Biology benchmarking paper.
Specifically, I downsample cells to 80% (without replacement; this 80% is currently hardcoded), and then I consider the top num_pcs (default 20) to perform k-means clustering for a range of k-values (seq(5,25,5)). I repeat this procedure 20x (also hardcoded). It's written in a way that each replicate uses the sample downsampled data. The function returns a tibble with rep, k, and batch_ari values for each calculation.

Some things to talk about:

  • Do we like the k \in seq(5,25,5) approach?
  • Should this function instead return a summarized tibble instead of raw values? I figure summarizing can take place later and it's best not to remove information quite yet.
  • Should any of the hard-coded parameters (% downsampling or number of reps) be function arguments with given defaults instead?
@sjspielman
Copy link
Member Author

Ha, this should have been a PR 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant