Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths playground #15

Closed
wants to merge 4 commits into from
Closed

Paths playground #15

wants to merge 4 commits into from

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Mar 18, 2020

not a PR, just a place to comment and play around before this gets merged into AnnData or scanpy.

Done:

  • both obs_df and get_obs_vector support paths as shorthand "X/Actb" or long ("obs", "A/B")
  • obs_df returns a Data Frame with unique column names. Smartly resolves collisions: "obsm/X_pca/0" becomes "X_pca1" while "obsm/protein/CD11b_TotalSeqB" becomes "CD11b_TotalSeqB"

TODO:

  • allow leaving things out when not ambiguius ("X_pca/0" should work)
  • allow sequences: [("obsm", "X_pca", [0, 1])] would expand to [("obsm", "X_pca", 0), ("obsm", "X_pca", 1)]
  • obsp support

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

paths_get.py Show resolved Hide resolved
paths_get.py Show resolved Hide resolved
# path is shorthand: "obs/Foo" and "o/Foo"
if not isinstance(path, str):
return path
path = path.split("/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out for this, since column names in dataframes can currently have a "/" in them.

Also, we don't stop dataframe indices from having slashes. We're only expecting two "/" unless the value is in uns, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we use slashes we need actual parsing then? I mean things are pretty deterministic, so we could set the limit for splitting depending on the attribute. That would be super headachy when combined with fallbacks and shorthand though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm thinking about the same thing, I think we always needed actual parsing. Dataframe indices and columns can be arbitrary strings.

@ivirshup
Copy link
Member

I don't think we should allow access strings like: "obsm/X_pca/0", since the integer is ambiguous for dataframes. Is it meant to be the position, or the value?

@flying-sheep
Copy link
Member Author

flying-sheep commented Mar 18, 2020

that’s why I didn’t want data frames in .obsm.

It’s unambiguous when X_pca is an array, and named stuff should go into dedicated anndata modes (which don’t exist yet)

we can fix that in AnnData by not allowing data frames with int colnames or colnames that look like ints…

@ivirshup
Copy link
Member

that’s why I didn’t want data frames in .obsm

They're so useful though. I think it'd be fine to just have something like ad.Ref('obsm/X_pca", 0)

@flying-sheep
Copy link
Member Author

flying-sheep commented Mar 18, 2020

That’s just inconsistent and more verbose than a tuple. Why separate the first descent by a slash in a string and the second by a comma?

This seems simpler: ("obsm", "X_pca", 0)

Or this if we have an array: "obsm/X_pca/0"

paths_get.py Show resolved Hide resolved
@ivirshup
Copy link
Member

I was thinking more separating the container and the index. This could also allow:

ad.Ref("obsm/X_pca", (0, 1, 2))
ad.Ref("obs", r"leiden*")

Of course, the init could have a signature like ___init__(*args) and pass off to a recursive parsing function.

@flying-sheep
Copy link
Member Author

flying-sheep commented Mar 18, 2020

What’s the difference between a key and an index? is thing in obs[thing] a key or an index? I’d say it’s both.

We used to call that index “component”, but that falls flat when using a data frame and specifying a column name …

@flying-sheep flying-sheep mentioned this pull request Mar 19, 2020
24 tasks
@Zethson
Copy link
Member

Zethson commented Apr 27, 2023

Given that this includes the citeseq tutorials I don't think that this is relevant anymore

@Zethson Zethson closed this Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants