Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read Visium HD data using spatialdata-io (Recurrent error). Data is non-zarr format. #3342

Open
2 of 3 tasks
ankshe91 opened this issue Nov 6, 2024 · 11 comments
Open
2 of 3 tasks
Labels
Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer

Comments

@ankshe91
Copy link

ankshe91 commented Nov 6, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

I have non zarr format Visium HD data.
I tried reading it with sdata = visium_hd(path_read)

it keeps asking me for a dataset_id which is not there in the feature_slice file name or my folder.
Nonetheless, I kept setting it to None or "" or other possible dataset id values.

I cannot find any tech support on the error either.

(I also tried specifying the file path to the different binned folders)

Minimal code sample

path_read = '/Users/DarthRNA/Downloads/1299_1_XS_VHD_v2_outs'
sdata = visium_hd(path_read)

Error output


ValueError Traceback (most recent call last)
Cell In[54], line 1
----> 1 sdata = visium_hd(path_read)

File /Volumes/Ankitha/Conda/miniconda3/envs/myenv/lib/python3.12/site-packages/spatialdata_io/readers/visium_hd.py:95, in visium_hd(path, dataset_id, filtered_counts_file, bin_size, bins_as_squares, fullres_image_file, load_all_images, imread_kwargs, image_models_kwargs, anndata_kwargs)
92 images: dict[str, Any] = {}
94 if dataset_id is None:
---> 95 dataset_id = infer_dataset_id(path)
96 filename_prefix = f"{dataset_id}
"
98 def load_image(path: Path, suffix: str, scale_factors: list[int] | None = None) -> None:

File /Volumes/Ankitha/Conda/miniconda3/envs/myenv/lib/python3.12/site-packages/spatialdata_io/readers/visium_hd.py:361, in _infer_dataset_id(path)
359 files = [f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f)) and f.endswith(suffix)]
360 if len(files) == 0 or len(files) > 1:
--> 361 raise ValueError(
362 f"Cannot infer dataset_id from the feature slice file in {path}, please pass dataset_id as an argument."
363 )
364 return files[0].replace(suffix, "")

ValueError: Cannot infer dataset_id from the feature slice file in /Users/DarthRNA/Downloads/1299_1_XS_VHD_v2_outs, please pass dataset_id as an argument.

Versions


@ankshe91 ankshe91 added Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer labels Nov 6, 2024
@Nina-Song
Copy link

Same issue here. My HD data structure is similar to 10x Mouse Small Intestine default structure, which contains ['feature_slice.h5', 'metrics_summary.csv', 'probe_set.csv', 'possorted_genome_bam.bam', 'spatial', 'binned_outputs', 'molecule_info.h5', 'possorted_genome_bam.bam.bai', 'web_summary.html', 'cloupe_008um.cloupe']

if there could be any tutorial how to read it and then convert to zarr will be great :> thank you again for this amazing package development!

@Nina-Song
Copy link

Nina-Song commented Nov 8, 2024

Hi @ankshe91 , i tried to directly download 10x Mouse Small Intestine data from their website, and used it as input (remember to unzip some of the .tar.gz files)
now the visium_hd function works. i guess our previous naming issue causing error.

nsong@gemini-data1:/home/Visium_HD_Mouse_Small_Intestine
$ ls
binned_outputs
spatial
Visium_HD_Mouse_Small_Intestine_cloupe_008um.cloupe
Visium_HD_Mouse_Small_Intestine_feature_slice.h5
Visium_HD_Mouse_Small_Intestine_metrics_summary.csv
Visium_HD_Mouse_Small_Intestine_molecule_info.h5
Visium_HD_Mouse_Small_Intestine_spatial.tar.gz
Visium_HD_Mouse_Small_Intestine_web_summary.html

sdata = spatialdata_io.visium_hd(path_read)
Image

@ankshe91
Copy link
Author

ankshe91 commented Nov 8, 2024 via email

@Nina-Song
Copy link

Hi @ankshe91 , does Visium_HD_Mouse_Small_Intestine demo works in your script?

@ankshe
Copy link

ankshe commented Nov 8, 2024

I haven't tried the sample dataset yet.

But I see that your own dataset also doesn't have the dataset_ids.
does that work with the reader?

@Nina-Song
Copy link

Nina-Song commented Nov 8, 2024

Maybe start with https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine (batch download) could be a good idea, i mimic their folder structure and now worked on my own data as well. (the screenshot attached previously was from this demo data not my own data but both of them work now)

@ankshe
Copy link

ankshe commented Nov 8, 2024

Thank you!
I'll try doing that!

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

@Nina-Song
Copy link

Nina-Song commented Nov 12, 2024

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer
Projects
None yet
Development

No branches or pull requests

3 participants