Ref paths #342

flying-sheep · 2020-03-19T12:14:44Z

Continuation of scverse/scanpy-tutorials#15 in the proper repo.

This is already pretty usable, I’ll extend and refine it. API:

RefPath(attr: str, *path: str|int) creates and validates a path
RefPath.parse(str|(str|int)s|RefPath) does too, allowing shortcuts like "o/foo" → ("obs", "foo")
RefPath.get_vector(adata, alias_col) fetches a 1D np.ndarray
AnnData.get_vector(*path: str|RefPathLike, dim: "obs"|"var", layer: str, …) allows fallback. E.g. ad.get_vector('foo', dim='obs') finds foo in X or obs while ad.get_vector('X_pca', 0) finds it in obsm. Or so.
AnnData.get_df(paths: (str|RefPathLike)s, dim: "obs"|"var", *, layer: str, …) multiplexed AnnData.get_vector. Returns a data frame with unique column names. Colname builder resolves collisions: "obsm/X_pca/0" becomes "X_pca1" while "obsm/protein/CD11b_TotalSeqB" becomes "CD11b_TotalSeqB"

Docs:

Checklist:

ivirshup · 2020-03-20T08:08:49Z

This looks really awesome!

A few comments and questions

allow globs: How? just allow "X/var/*" or so? How to specify in tuples?

Could we use either fspath or regex matching?

Also, what does "X/var" mean?

Couple questions about the current scope:

What about getting things from var aligned arrays? How about getting rows from obsm arrays?

falexwolf · 2020-03-20T08:25:41Z

anndata/_core/index.py

+    k: str,
+    coldim: Literal["obs", "var"],
+    idxdim: Literal["obs", "var"],
+    layer: Optional[str] = None,


How about adding an axis=None param that defaults to 0 for obsp and to ? for obsm?

To resolve the ambiguity of indexing into obsp (defaulting to rows), and potentially allows indexing into rows of obsm (where it defaults to columns).

To what for obsm?

falexwolf · 2020-03-20T08:26:51Z

Looks great, Phil! 😄

ivirshup · 2020-03-20T08:27:17Z

anndata/tests/test_ref_path.py

+
+from anndata import AnnData
+from anndata._core.ref_path import RefPath, split_paths
+


Could there also be tests for ambiguities throwing errors here?

you mean ambiguities in AnnData.get_vector?

flying-sheep · 2020-03-20T12:01:23Z

Also, what does "X/var" mean?

It’s because they’re unambiguous, so the axis has to be specified:

("X", "var", "Foo")means “get the n_obs length vector at np.where(var_names == 'Foo')[0]”.

What about getting things from var aligned arrays? How about getting rows from obsm arrays?

That doesn’t fit with the dim attribute, but if you can come up with a good syntax and a solution for that we can do it!

flying-sheep · 2020-03-20T12:13:32Z

Can you help me come up with shortcut paths that AnnData.resolve_path should support?

Do you agree with the following? Would you add more?

@pytest.mark.parametrize("short_path,resolved", [
    # single strings should be found in {obs,var}{.columns,_names}
    # but not in {obs,var}m DataFrame columns (too deep and obscure)
    ("Cell1", ("layers", "X", "obs", "Cell1")),
    ("group", ("obs", "group")),
    # keys with subpaths should be found in layers, {obs,var}{p,m}
    ("X_pca/1", ("obsm", "X_pca", 1)),
    ("unspliced/GeneY", ("layers", "unspliced", "var", "GeneY")),
    # {obs,var}p should default to axis 0
    (("neighbors_distances", "Cell2"), ("obsp", "neighbors_distances", "Cell2", 0)),
    ...?
])
def test_resolve(adata, short_path, resolved):
    assert adata.resolve_path(short_path) == RefPath(*resolved)

ivirshup · 2020-03-20T12:15:31Z

I don't think I like the axis being specified the same way a nested element would be. I think it's mixing metaphors, and namespaces. I would prefer an axis or dim key word argument for this.

It's also something that won't need to be specified by the user in the contexts we've been discussing, since it's implied. For example, in obs_df, all elements should be aligned to obs_names. For sc.pl.embedding, similar.

That doesn’t fit with the dim attribute, but if you can come up with a good syntax and a solution for that we can do it!

Alright. I don't think this is too important for now.

flying-sheep · 2020-03-20T12:19:48Z

The dim keyword exists. This is for cases where it’s None

ivirshup · 2020-03-20T12:23:12Z

The dim keyword exists. This is for cases where it’s None

I don't think it should be the second argument like this. There could maybe be a different delimiter for it, but otherwise I think it's an argument to the RefDim constructor.

ivirshup · 2020-03-20T13:13:01Z

Recap of our call:

I don't like specifying the dimension as a sub-element. I think it would be good to be able to specify the dimension in a string representation, but I don't think it's critical right now. Can we make it so cases like "X/var/elem" don't work?
The internals you built look interesting, and I'd like a chance to play around with them to find edge cases and see how they can be extended. So we have a chance to do this, and can have a release soon, could we make them private for now? I'd like to make the surface of this minimal until we've had time to play with it more. Also the really important thing this does is allow stuff like sc.pl.embedding(adata, colors="obsm/X_pca/0"), which doesn't require adata.get_vector to be public right now.

flying-sheep · 2020-03-20T13:28:12Z

I added an “Internal API” page because all those docs don’t write themselves.

anndata/_core/ref_path.py

flying-sheep · 2020-03-20T17:15:29Z

resolve_path progress:

finds single strings in {obs,var}{.columns,_names}
but not in {obs,var}m DataFrame columns (too deep and obscure)
- "Cell1" → RefPath("layers", "X", "obs", "Cell1")
- "group" → RefPath("obs", "group")
finds keys with subpaths in layers, {obs,var}m
- "X_pca/1" → RefPath("obsm", "X_pca", 1)
- "unspliced/GeneY" → RefPath("layers", "unspliced", "var", "GeneY")
find layers in {obs,var}p, default to axis 0
- ("neighbors_distances", "Cell2") → RefPath("obsp", "neighbors_distances", "Cell2", 0)
Error on ambiguity

flying-sheep · 2020-07-02T16:39:35Z

docs TODO:

remove adata from docstring
examples

flying-sheep added 5 commits March 19, 2020 12:37

almost done!

faae00e

Getting vectors works

328f5ee

Implement RefPath.dim

d35154b

Parse tests

384e5b6

Better errors

0292a30

flying-sheep added 7 commits March 19, 2020 16:07

rough structure for resolving

aa57fb1

Implement splitting multipaths

c9e2a47

docs

e016cfc

Fix docs

28ccbbf

really fix docs

6a3effc

maybe this helps

8c83803

link

f0a65f7

falexwolf reviewed Mar 20, 2020

View reviewed changes

ivirshup reviewed Mar 20, 2020

View reviewed changes

RefPath docs

d7cd871

Unexport RefPath

c867a41

ivirshup reviewed Mar 20, 2020

View reviewed changes

anndata/_core/ref_path.py Outdated Show resolved Hide resolved

flying-sheep added 2 commits March 20, 2020 16:00

More parse error tests

b9cee39

Progress for resolve_path

0bca090

flying-sheep mentioned this pull request Apr 1, 2020

Plotting overhaul scverse/scanpy#1116

Closed

This was referenced Apr 29, 2020

Ability to index by label along one axis into obsp, varp arrays #358

Open

UMAP colouring beyond adata.X or adata.obs scverse/scanpy#1189

Closed

Make resolve_path work for layers and obsm/varm

f567a6e

ivirshup mentioned this pull request Jun 29, 2020

Default join option in concatenate #390

Closed

ivirshup mentioned this pull request Nov 19, 2020

Support obsm key to color UMAP scverse/scanpy#1500

Open

5 tasks

ivirshup mentioned this pull request Mar 10, 2021

Integration of dorothea and progeny scverse/scanpy#1724

Closed

flying-sheep mentioned this pull request Aug 23, 2024

Add parameter for more resilient concat_on_disk #1602

Open

3 tasks

flying-sheep closed this Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ref paths #342

Ref paths #342

flying-sheep commented Mar 19, 2020 •

edited

Loading

ivirshup commented Mar 20, 2020 •

edited

Loading

falexwolf Mar 20, 2020 •

edited by flying-sheep

Loading

falexwolf Mar 20, 2020

flying-sheep Mar 20, 2020

falexwolf commented Mar 20, 2020

ivirshup Mar 20, 2020

flying-sheep Mar 20, 2020 •

edited

Loading

ivirshup Mar 20, 2020

flying-sheep commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

ivirshup commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

ivirshup commented Mar 20, 2020 •

edited

Loading

ivirshup commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

flying-sheep commented Mar 20, 2020 •

edited

Loading

flying-sheep commented Jul 2, 2020


		from anndata import AnnData
		from anndata._core.ref_path import RefPath, split_paths

Ref paths #342

Ref paths #342

Conversation

flying-sheep commented Mar 19, 2020 • edited Loading

ivirshup commented Mar 20, 2020 • edited Loading

falexwolf Mar 20, 2020 • edited by flying-sheep Loading

Choose a reason for hiding this comment

falexwolf Mar 20, 2020

Choose a reason for hiding this comment

flying-sheep Mar 20, 2020

Choose a reason for hiding this comment

falexwolf commented Mar 20, 2020

ivirshup Mar 20, 2020

Choose a reason for hiding this comment

flying-sheep Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

ivirshup Mar 20, 2020

Choose a reason for hiding this comment

flying-sheep commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

ivirshup commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

ivirshup commented Mar 20, 2020 • edited Loading

ivirshup commented Mar 20, 2020

flying-sheep commented Mar 20, 2020

flying-sheep commented Mar 20, 2020 • edited Loading

flying-sheep commented Jul 2, 2020

flying-sheep commented Mar 19, 2020 •

edited

Loading

ivirshup commented Mar 20, 2020 •

edited

Loading

falexwolf Mar 20, 2020 •

edited by flying-sheep

Loading

flying-sheep Mar 20, 2020 •

edited

Loading

ivirshup commented Mar 20, 2020 •

edited

Loading

flying-sheep commented Mar 20, 2020 •

edited

Loading