Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOMA-collection-level slice/batch queries [needs revision] #157

Closed
wants to merge 234 commits into from

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Jun 9, 2022

Overview

Walkthrough

  • soco-slice-query.md shows you how to actually execute a slice query. This demonstrates the power of TileDB SOMAs to deliver targeted data for analysis at low latency. Notice that this examples writes the sliced output in both SOMA and .h5ad format, so you can go on to do all manner of analysis using for example Scanpy.
  • soco-batch-query.md shows you how to do a longer-running, cross-cutting batch query -- in this example, computing the mean of X/data, grouping by obs['cell_type_ontology_term_id'].

Notes

  • This replaces SOMA-collection sketching [WIP] #80.
  • The SOMASlice object currently has a (well-isolated) runtime dependency on AnnData for its concat -- namely, it converts a list of SOMASlice to a list of AnnData, leverages AnnData's concat, then converts the concatenated AnnData object back to SOMASlice.

Update

#173 is the re-do

@johnkerl johnkerl marked this pull request as ready for review June 9, 2022 19:47
@johnkerl johnkerl force-pushed the kerl/soma-slice branch 2 times, most recently from fb77162 to fc07e3c Compare June 9, 2022 19:51
@johnkerl johnkerl changed the title SOMA-level dimension slicing [RFC] SOMA-collection-level dimension slicing [RFC] Jun 9, 2022
@johnkerl johnkerl mentioned this pull request Jun 9, 2022
61 tasks
@johnkerl johnkerl force-pushed the kerl/soma-slice branch 2 times, most recently from 91a78c9 to 9ebc9a4 Compare June 9, 2022 22:00
apis/python/examples/soma-slice-query.md Outdated Show resolved Hide resolved
]:
soma_path = os.path.join(soco_path, name)
soma = tiledbsc.SOMA(soma_path)
tiledbsc.io.from_h5ad(soma, h5ad)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is a non-pythonic way to call a function with such a name. Generally, I would expect that a from_* function returns the created object, instead of mutating an argument. I understand that the object is disk-based so it makes sense to pre-create it and pass it as an argument, but then I'd probably rename the function to something that makes the side effect more explicit (like write_from_h5ad). @bkmartinjr your opinion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebezzi originally it was soma.from_h5ad(h5ad) but there was a desire to have a separate tiledbsc.io namespace (#83) in order to more clearly separate I/O (peripheral) from SOMA-API processing (central).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is non-pythonic (I lack expertise to make that determination). But I do find it awkward and likely error-prone as the two step process (create the soma, write the soma) allows for cases that will fail (eg, the soma is modified before from_h5ad is called).

I personally prefer a single function that both creates the soma and ingests the H5AD, eg, soma.io.from_h5ad(soma_path: str, h5ad_path: str, ...) -> soma. Ie, this function would call the SOMA constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know @bkmartinjr -- thanks for the feedback! I'm fine with this; only wish I'd known sooner :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well...I am unclear who is establishing the API and the review process for it. So I'm just giving you my opinions :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkmartinjr @ebezzi I'll go ahead & make this change but on a separate PR -- this soma.io business is old & predates this PR, and this change can be neatly factored out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth posting this as a separate issue for discussion/feedback? The R API currently works this way too (ie, ingestion and SOMA instantiation are separate steps) and altering that behavior would require a lot of small updates to docs/tests/etc.

Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working through it but I posted a couple of comments for now.

apis/python/src/tiledbsc/annotation_dataframe.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsc/annotation_dataframe.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsc/soma.py Outdated Show resolved Hide resolved
@aaronwolen aaronwolen self-assigned this Jun 10, 2022
@johnkerl johnkerl force-pushed the kerl/soma-slice branch 14 times, most recently from 771d3f5 to 5470ff2 Compare June 11, 2022 03:46
@johnkerl
Copy link
Member Author

After rebases this PR is completely out of control (> 200 commits including mostly merge commits).

Closing, and will resubmit.

@johnkerl johnkerl closed this Jun 20, 2022
@johnkerl
Copy link
Member Author

re-opening long enough to squash

@johnkerl johnkerl reopened this Jun 20, 2022
@johnkerl johnkerl closed this Jun 20, 2022
@johnkerl
Copy link
Member Author

failed squash :(

johnkerl added a commit that referenced this pull request Jun 20, 2022
@johnkerl
Copy link
Member Author

#173 is the re-do

johnkerl added a commit that referenced this pull request Jun 20, 2022
johnkerl added a commit that referenced this pull request Jun 20, 2022
johnkerl added a commit that referenced this pull request Jun 21, 2022
johnkerl added a commit that referenced this pull request Jun 21, 2022
johnkerl added a commit that referenced this pull request Jun 21, 2022
johnkerl added a commit that referenced this pull request Jun 22, 2022
* Re-do of #157

* remove now-duplicate file

* bring an unaffected example file up to date with main

* add var.feature_name to examples/collection_counts.py

* mkmd.sh doc-gen

* Allow obs_ids/var_ids and obs_query_string/var_query_string in soco/soma attribute_filter

* mkmd.sh doc-gen

* attribute_filter -> query

* manual testing
@johnkerl johnkerl deleted the kerl/soma-slice branch June 23, 2022 13:57
nguyenv pushed a commit to nguyenv/TileDB-SOMA that referenced this pull request Jan 10, 2024
…build-python311-0-1_h50996c

Rebuild for python311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants