-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paths playground #15
Paths playground #15
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
# path is shorthand: "obs/Foo" and "o/Foo" | ||
if not isinstance(path, str): | ||
return path | ||
path = path.split("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch out for this, since column names in dataframes can currently have a "/"
in them.
Also, we don't stop dataframe indices from having slashes. We're only expecting two "/" unless the value is in uns
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we use slashes we need actual parsing then? I mean things are pretty deterministic, so we could set the limit for splitting depending on the attribute. That would be super headachy when combined with fallbacks and shorthand though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm thinking about the same thing, I think we always needed actual parsing. Dataframe indices and columns can be arbitrary strings.
I don't think we should allow access strings like: |
that’s why I didn’t want data frames in .obsm. It’s unambiguous when X_pca is an array, and named stuff should go into dedicated anndata modes (which don’t exist yet) we can fix that in AnnData by not allowing data frames with int colnames or colnames that look like ints… |
They're so useful though. I think it'd be fine to just have something like |
That’s just inconsistent and more verbose than a tuple. Why separate the first descent by a slash in a string and the second by a comma? This seems simpler: Or this if we have an array: |
I was thinking more separating the container and the index. This could also allow: ad.Ref("obsm/X_pca", (0, 1, 2))
ad.Ref("obs", r"leiden*") Of course, the init could have a signature like |
What’s the difference between a key and an index? is We used to call that index “component”, but that falls flat when using a data frame and specifying a column name … |
Given that this includes the citeseq tutorials I don't think that this is relevant anymore |
not a PR, just a place to comment and play around before this gets merged into AnnData or scanpy.
Done:
obs_df
andget_obs_vector
support paths as shorthand"X/Actb"
or long("obs", "A/B")
obs_df
returns a Data Frame with unique column names. Smartly resolves collisions:"obsm/X_pca/0"
becomes"X_pca1"
while"obsm/protein/CD11b_TotalSeqB"
becomes"CD11b_TotalSeqB"
TODO:
"X_pca/0"
should work)[("obsm", "X_pca", [0, 1])]
would expand to[("obsm", "X_pca", 0), ("obsm", "X_pca", 1)]