Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load CRISPR perturbation datasets from scPerturb [Feature Request] #239

Closed
abearab opened this issue Apr 7, 2024 · 4 comments
Closed

Load CRISPR perturbation datasets from scPerturb [Feature Request] #239

abearab opened this issue Apr 7, 2024 · 4 comments

Comments

@abearab
Copy link
Contributor

abearab commented Apr 7, 2024

Describe the bug

I'm interested in using single-cell CRISPR perturbation datasets such asNormanWeissman2019, and ReplogleWeissman2022 datasets.

Full list of scPerturb datasets

Questions

  1. I tried to review the codes in Integrate AnnData and scperturb #236 but I didn't understand if datasets were collected directly from scPerturb or not. Could you provide more information, please?
  2. How can I use TDC modules to load the scPerturb datasets in Python?

Suggestion

h5ad files for RNA and protein datasets, created using scanpy 1.9.1

For many reasons, it would be nice if the data loader function could enable users to loadh5ad files as AnnData objects (at least as an option).


Originally posted in #236 (comment)

cc @amva13 @kexinhuang12345

@abearab
Copy link
Contributor Author

abearab commented Apr 7, 2024

For the 1st question, now I can see that some of the scPerturb files are uploaded in TDC dataverse.
image

@amva13
Copy link
Member

amva13 commented Apr 23, 2024

closed with #252 thanks @kexinhuang12345 !

@amva13 amva13 closed this as completed Apr 23, 2024
@abearab
Copy link
Contributor Author

abearab commented Apr 23, 2024

Awesome! Thanks @kexinhuang12345

@abearab
Copy link
Contributor Author

abearab commented Apr 29, 2024

Hi @kexinhuang12345, as you know ReplogleWeissman2022 study has three datasets.

image

Currently, as I understand ReplogleWeissman2022_K562_gwps data is not uploaded. However, I noticed a weird behavior when I tried to load it! I had ReplogleWeissman2022_k562_essential already downloaded in a path folder and then I tried loading scperturb_gene_ReplogleWeissman2022_K562_gwps and noticed it's saying Found local copy...!

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets')
Found local copy...
Loading...

Looking at the # of perturbations, it's not true for _gwps dataset. It should be 9867 but it's 2058 (this is the same number as _essential dataset)

>>> test_load.adata.obs.perturbation.unique()

Length: 2058

Looking more carefully, I tried an empty folder and noticed for some reason this is downloading wrong file for _gwps.

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets/new/')
Downloading...
█████████████████████████████████████████████| 1.55G/1.55G [01:09<00:00, 22.2MiB/s]
Loading...
~: ls Datasets/new/

scperturb_gene_ReplogleWeissman2022_k562_essential.h5ad

cc @amva13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants