You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds a function calculate_batch_ari() to calculate batch ARI. I loosely followed the approach in the Genome Biology benchmarking paper.
Specifically, I downsample cells to 80% (without replacement; this 80% is currently hardcoded), and then I consider the top num_pcs (default 20) to perform k-means clustering for a range of k-values (seq(5,25,5)). I repeat this procedure 20x (also hardcoded). It's written in a way that each replicate uses the sample downsampled data. The function returns a tibble with rep, k, and batch_ari values for each calculation.
Some things to talk about:
Do we like the k \in seq(5,25,5) approach?
Should this function instead return a summarized tibble instead of raw values? I figure summarizing can take place later and it's best not to remove information quite yet.
Should any of the hard-coded parameters (% downsampling or number of reps) be function arguments with given defaults instead?
The text was updated successfully, but these errors were encountered:
Closes #86
⚠️ Stacked on #95
This PR adds a function
calculate_batch_ari()
to calculate batch ARI. I loosely followed the approach in the Genome Biology benchmarking paper.Specifically, I downsample cells to 80% (without replacement; this 80% is currently hardcoded), and then I consider the top
num_pcs
(default 20) to perform k-means clustering for a range of k-values (seq(5,25,5)
). I repeat this procedure 20x (also hardcoded). It's written in a way that each replicate uses the sample downsampled data. The function returns a tibble withrep
,k
, andbatch_ari
values for each calculation.Some things to talk about:
k \in seq(5,25,5)
approach?The text was updated successfully, but these errors were encountered: