Skip to content

Commit

Permalink
split out #172 to make this PR smaller
Browse files Browse the repository at this point in the history
  • Loading branch information
johnkerl committed Jun 20, 2022
1 parent 99f1612 commit f08a044
Show file tree
Hide file tree
Showing 10 changed files with 763 additions and 482 deletions.
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ website:
text: "SOMA slice query"
- href: "apis/python/examples/normalizing.md"
text: "Normalizing a collection"
- href: "apis/python/examples/soco-reconnaissance.md"
- href: "apis/python/examples/soma-collection-reconnaissance.md"
text: "SOMA-collection reconnaissance"
- href: "apis/python/examples/soco-slice-query.md"
text: "SOMA-collection slice query"
Expand Down
47 changes: 0 additions & 47 deletions apis/python/examples/collection-counts.py

This file was deleted.

20 changes: 3 additions & 17 deletions apis/python/examples/ingesting-data-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,9 @@ tools/ingestor -o /mini-corpus/tiledb-data -n /mini-corpus/anndata/10x_pbmc68k_r
...
```

Note this can take several hours total. The benefit of using an optimized storage solution (with
admittedly non-negligible ingest time) is that all subsequent queries benefit from that optimized
storage. In particular, various cross-corpus data queries shown in these examples take just seconds
or minutes.

A key point is **write once, read from multiple tools** -- in particular, using `tiledbsc-py` (this
package) or [`tiledbsc-r`](https://github.com/TileDB-Inc/tiledbsc) you can read SOMAs in either
language, regardless of which language was used to store them. This lets you use
best-in-class/state-of-the-art analysis algorithms, whichever language they're implemented in.
Note this takes many hours. The benefit of using an optimized storage solution (with admittedly
non-negligible ingest time) is that all subsequent queries benefit from that optimized storage. In
particular, various cross-corpus data queries shown in these examples take just seconds or minutes.

## Populate a SOMA collection

Expand All @@ -36,14 +30,6 @@ populate-soco -o /mini-corpus/soco -a /mini-corpus/tiledb-data/*

Note this is quite quick.

As a keystroke-saver, use the `tools/ingestor` script's `--soco` option which will populate the SOMA
collection at ingest time, so you don't even have to run `populate-soco` as an afterstep.

```
tools/ingestor -o /mini-corpus/tiledb-data --soco -n /mini-corpus/anndata/0cfab2d4-1b79-444e-8cbe-2ca9671ca85e.h5ad
tools/ingestor -o /mini-corpus/tiledb-data --soco -n /mini-corpus/anndata/10x_pbmc68k_reduced.h5ad
```

## Names and URIs

Next let's start taking a look across the collection.
Expand Down
12 changes: 6 additions & 6 deletions apis/python/examples/inspecting-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,20 +73,20 @@ dtype('uint8'), 'feature_name': dtype('S'), 'feature_reference': dtype('<U')}
```
print("OBS NAMES")
for soma in soco:
print(soma.name)
print(soma.uri)
for attr_name in soma.obs.keys():
print(" obs", attr_name)
print("VAR NAMES")
for soma in soco:
print(soma.name)
print(soma.uri)
for attr_name in soma.var.keys():
print(" var", attr_name)
```

```
OBS NAMES
tabula-sapiens-immune
file:///mini-corpus/tiledb-data/tabula-sapiens-immune
obs tissue_in_publication
obs assay_ontology_term_id
obs donor
Expand All @@ -113,7 +113,7 @@ tabula-sapiens-immune
obs tissue
obs ethnicity
obs development_stage
integrated-human-lung-cell-atlas
file:///mini-corpus/tiledb-data/integrated-human-lung-cell-atlas
obs is_primary_data
obs assay_ontology_term_id
obs cell_type_ontology_term_id
Expand Down Expand Up @@ -186,7 +186,7 @@ integrated-human-lung-cell-atlas
...
VAR NAMES
tabula-sapiens-immune
file:///mini-corpus/tiledb-data/tabula-sapiens-immune
var feature_type
var ensemblid
var highly_variable
Expand All @@ -199,7 +199,7 @@ tabula-sapiens-immune
var feature_is_filtered
var feature_name
var feature_reference
integrated-human-lung-cell-atlas
file:///mini-corpus/tiledb-data/integrated-human-lung-cell-atlas
var n_cells
var highly_variable
var means
Expand Down
Loading

0 comments on commit f08a044

Please sign in to comment.