-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ref paths #342
Ref paths #342
Conversation
This looks really awesome! A few comments and questions
Could we use either Also, what does Couple questions about the current scope: What about getting things from |
anndata/_core/index.py
Outdated
k: str, | ||
coldim: Literal["obs", "var"], | ||
idxdim: Literal["obs", "var"], | ||
layer: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an axis=None
param that defaults to 0 for obsp
and to ?
for obsm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To resolve the ambiguity of indexing into obsp
(defaulting to rows), and potentially allows indexing into rows of obsm
(where it defaults to columns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what for obsm
?
Looks great, Phil! 😄 |
|
||
from anndata import AnnData | ||
from anndata._core.ref_path import RefPath, split_paths | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there also be tests for ambiguities throwing errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean ambiguities in AnnData.get_vector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
It’s because they’re unambiguous, so the axis has to be specified:
That doesn’t fit with the |
Can you help me come up with shortcut paths that Do you agree with the following? Would you add more? @pytest.mark.parametrize("short_path,resolved", [
# single strings should be found in {obs,var}{.columns,_names}
# but not in {obs,var}m DataFrame columns (too deep and obscure)
("Cell1", ("layers", "X", "obs", "Cell1")),
("group", ("obs", "group")),
# keys with subpaths should be found in layers, {obs,var}{p,m}
("X_pca/1", ("obsm", "X_pca", 1)),
("unspliced/GeneY", ("layers", "unspliced", "var", "GeneY")),
# {obs,var}p should default to axis 0
(("neighbors_distances", "Cell2"), ("obsp", "neighbors_distances", "Cell2", 0)),
...?
])
def test_resolve(adata, short_path, resolved):
assert adata.resolve_path(short_path) == RefPath(*resolved) |
I don't think I like the axis being specified the same way a nested element would be. I think it's mixing metaphors, and namespaces. I would prefer an It's also something that won't need to be specified by the user in the contexts we've been discussing, since it's implied. For example, in
Alright. I don't think this is too important for now. |
The |
I don't think it should be the second argument like this. There could maybe be a different delimiter for it, but otherwise I think it's an argument to the RefDim constructor. |
Recap of our call:
|
I added an “Internal API” page because all those docs don’t write themselves. |
|
docs TODO:
|
Continuation of scverse/scanpy-tutorials#15 in the proper repo.
This is already pretty usable, I’ll extend and refine it. API:
RefPath(attr: str, *path: str|int)
creates and validates a pathRefPath.parse(str|(str|int)s|RefPath)
does too, allowing shortcuts like"o/foo"
→("obs", "foo")
RefPath.get_vector(adata, alias_col)
fetches a 1Dnp.ndarray
AnnData.get_vector(*path: str|RefPathLike, dim: "obs"|"var", layer: str, …)
allows fallback. E.g.ad.get_vector('foo', dim='obs')
finds foo inX
orobs
whilead.get_vector('X_pca', 0)
finds it inobsm
. Or so.AnnData.get_df(paths: (str|RefPathLike)s, dim: "obs"|"var", *, layer: str, …)
multiplexedAnnData.get_vector
. Returns a data frame with unique column names. Colname builder resolves collisions:"obsm/X_pca/0"
becomes"X_pca1"
while"obsm/protein/CD11b_TotalSeqB"
becomes"CD11b_TotalSeqB"
Docs:
RefPath
AnnData.get_vector
AnnData.get_df
AnnData.resolve_path
Checklist:
RefPath.get_vector
supports paths as shorthand"X/Actb"
or long("obs", "A/B")
add RefPath.dim which is e.g. useful in plotting to check if all RefPaths will return matching stuff
handle numeric key situation.
"obsm/X_pca/0"
is useful if X_pca is an array.get_vector
andget_df
methods on AnnData that allow leaving things out there when not ambiguous("X_pca", 0)
→("obsm", "X_pca", 0)
), use the usuallayer
, … params as fallbackget_df
[("obsm", "X_pca", [0, 1])]
would expand to[("obsm", "X_pca", 0), ("obsm", "X_pca", 1)]
"X/var/*"
or so? How to specify in tuples?Mapping[str, RefPathLike]
?)get_vector
add more tests: