-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/
in column names makes AnnData Zarr object unreadable on windows
#1447
Comments
/
in column names makes AnnData Zarr object unreadable/
in column names makes AnnData Zarr object unreadable on windows
@melonora, can you write a column with a
Basically this just happens to work with how we do columns. The specific group creation behavior could very well be considered a bug.
I would argue against this. It hurts accessibility for non-English users. Examples I've seen include Chinese characters used in columns for medical data. I also think it's not terribly uncommon for english speakers to want to use greek letters in names for things ( αβ/ γδ T cells, for instance). I recall topics like this being discussed at length in some zarr community calls/ channels. I'd recommend checking over there for any recommendations/ solutions. |
I did, the zarr store with current version of |
@LucaMarconato is the uploaded steinbock date perhaps outdated? |
I agree with you, I would be up for allowing all the characters for this reason. The only reason why I wanted to restrict the name is that I would like to minimize the risk of weird behaviors if the name of the element is interpreted as a path. Things like |
@melonora, I have just verified. The Steinbock example is up-to-date and it's using the latest Did I understand correctly: you can reproduce the bug only when the |
Thanks for clarifying. But really not sure what's going on here then. |
Yeah the behaviour is really different. The encoding_type of var in this case is a dataframe. This is specified in .zattrs in var. |
I'm a little confused here. I thought the issue was with column names, not row names? @melonora, if you do: from anndata.experimental import read_elem, write_elem
import pandas as pd
import zarr
g = zarr.open("test_df.zarr", "w+")
df = pd.DataFrame({"col / with/ slashes": [1,2,3]})
write_elem(g, "df", df)
from_disk = read_elem(g["df"])
from_disk what do you get? |
This gives an error because of the |
This issue has been automatically marked as stale because it has not had recent activity. |
Please make sure these conditions are met
Report
Reported by a
spatialdata
Windows user and reproduced on a Windows machine scverse/spatialdata-io#129. I can't reproduce on my macOS machine or on a Linux machine.When
/
is part of a name of avar
column, the data is written to disk in a subfolder (also in macOS, see screenshot) and can still be read correctly. In Windows the column can't be read, probably because of the difference between/
and\
for paths.In
spatialdata
, we are considering checking all the element names and their respective element columns (e.g.GeoDataFrame
column names,AnnData
obs/var/... column names, etc) and allowing only strings with alphanumeric or the '-_.` symbols. The check would be performed when instantiating an object and in particular before writing, prompting the user for a name change.What are you opinion on this, in particular on restricting the names?
Please see the code and traceback in the attached SpatialData issue, as I can't reproduce on my machine: scverse/spatialdata-io#129.
Versions
The text was updated successfully, but these errors were encountered: