Function for batch ARI calculation #97

sjspielman · 2022-08-17T16:59:17Z

Closes #86
⚠️ Stacked on #95

This PR adds a function calculate_batch_ari() to calculate batch ARI. I loosely followed the approach in the Genome Biology benchmarking paper.
Specifically, I downsample cells to 80% (without replacement; this 80% is currently hardcoded), and then I consider the top num_pcs (default 20) to perform k-means clustering for a range of k-values (seq(5,25,5)). I repeat this procedure 20x (also hardcoded). It's written in a way that each replicate uses the sample downsampled data. The function returns a tibble with rep, k, and batch_ari values for each calculation.

Some things to talk about:

Do we like the k \in seq(5,25,5) approach?
Should this function instead return a summarized tibble instead of raw values? I figure summarizing can take place later and it's best not to remove information quite yet.
Should any of the hard-coded parameters (% downsampling or number of reps) be function arguments with given defaults instead?

The text was updated successfully, but these errors were encountered:

sjspielman · 2022-08-17T16:59:38Z

Ha, this should have been a PR 😂

sjspielman closed this as completed Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function for batch ARI calculation #97

Function for batch ARI calculation #97

sjspielman commented Aug 17, 2022

sjspielman commented Aug 17, 2022

Function for batch ARI calculation #97

Function for batch ARI calculation #97

Comments

sjspielman commented Aug 17, 2022

sjspielman commented Aug 17, 2022