-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CentroidSpace AnnData Annotations #455
Conversation
for more information, see https://pre-commit.ci
Excellent! We might actually try to copy all obs columns as far as we can? I also wonder whether any of the other PerturbationSpace methods need this? I'd hope that we could keep the dimensions of the objects somewhat consistent.
I again wonder whether we should keep all of them? |
Thanks for the review @Zethson!
I thought about doing that, but the reason I decided against it is that we can only keep the .obs columns with the same value for all cells of one perturbation. We could of course iterate through all .obs and test if the respective .obs variable fulfills this condition, but from what I've seen, it's quite common for perturbation datasets to have one .obs column for each perturbation (such as for the Norman dataset, >100 obs variables). Maybe I'm overestimating, but I'm concerned that it could scale up pretty quickly in terms of computational complexity?
Yes, that's true! Based on what I've tested so far, |
I think that there's probably pretty efficient pandas code for that. I'd encourage you to give it a try.
Ahh, that's fair. So let's try and stick to the |
PR Checklist
docs
is updatedDescription of changes
CentroidSpace
calculates the centroid for all cells of a given perturbation. The resulting AnnData object will have one row for each perturbation. The objective of this pull request is to enhance the annotation of the returned AnnData object, thereby simplifying its downstream usage. Specifically, the following modifications in theCentroidSpace.compute
method have been made:target_col
(the column containing perturbation annotations) as a .obs column in the output. Although perturbation information is also stored as an index in the returned adata, having it in .obs is useful, for instance, for the creation of a colored UMAP plot (see screenshot below).keep_obs
to thecompute
function, allowing users to specify .obs columns from the original adata that should be kept in the returned adata. This is useful when additional annotations are required for downstream processing, such as a pathway annotation for each perturbation.Example usage