[python] Partition sparse matrix reads in `tiledbsoma.io.to_anndata` #3328

bkmartinjr · 2024-11-15T19:10:49Z

Partition reads to sparse matrices in the soma.io.to_anndata code path. This change is solely for performance, and has no functional impact.

#3345 [sc-59595]

apis/python/src/tiledbsoma/io/outgest.py

johnkerl

Thanks @bkmartinjr -- this is a tour de force.

Looks good to me 👀 -wise; also, I ran to_anndata on local disk, S3, and cloud-hosted, comparing main to this branch, on EC2 Ubuntu m5.4xlarge, as well as on my M1 Mac. (Several variables there, all test-driven.) (Re Mac: I have a general feeling, more from the R side than the Python side admittedly, that 'threads are a little different' on MacOS so this was worth some interactive/hands-on/eyes-on experimenting.) I saw a speedup at a solid 2X at 100K cells, with all storage backends. I'll try 300K runs later; I won't report back here unless I see something surprising (which I don't expect.)

cc @nguyenv for another pair of eyes.

bkmartinjr added 30 commits November 6, 2024 14:11

first cut at fast CSX conversion

121d2a2

fix compile time warning

0e402b9

build python bindings -O3

c68f736

typos

d3ab2e0

Merge branch 'main' into bkmartinjr/fastercsx

7a642a3

fix clang compile issue

48e3dbc

reformat to repo C++ standards

d6f6a20

Merge branch 'main' into bkmartinjr/fastercsx

b67d297

fix clang warning about unused captures

3733f4e

more clang fixes

27810a3

Merge branch 'main' into bkmartinjr/fastercsx

7d05331

fix build error found during CI

9d0b7ba

Merge branch 'main' into bkmartinjr/fastercsx

70bf7ca

lint

9b23b02

more lint

fa44160

lint chase

ff1ed9e

Merge branch 'main' into bkmartinjr/fastercsx

1f273e6

Merge branch 'main' into bkmartinjr/fastercsx

9e2826a

cleanup include statements

91f63d9

Merge branch 'main' into bkmartinjr/fastercsx

3ddd252

debugging R build

0abb9f3

lint

48030ab

cleanup GHA for interop testing

12c7565

Merge branch 'main' into bkmartinjr/fastercsx

bae3a03

add -mavx2 for x86 build

8f3eb7d

more tests

ae6b533

comment

2d9050c

add bounds check for second dimension coordiate

866c4f5

lint

98529f8

test / bug fix argument handling

d1b8e32

bkmartinjr requested review from johnkerl and ryan-williams November 15, 2024 19:59

johnkerl changed the title ~~partition sparse matrix reads in soma.io.to_anndata~~ [python] Partition sparse matrix reads in tiledbsoma.io.to_anndata Nov 15, 2024

bkmartinjr added 13 commits November 15, 2024 12:47

clean up C++ namespace

6a7e26e

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

67aceba

revert cleanup on request

3e8672d

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

fe69600

PR feedback (thanks John!)

1986ad5

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

63b867b

incorporate more PR fb

bc866f8

Merge branch 'main' into bkmartinjr/fastercsx

6242085

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

f6f9f12

lint

de34a42

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

687ef4f

comments

e5ef34d

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

468ee73

johnkerl reviewed Nov 17, 2024

View reviewed changes

apis/python/src/tiledbsoma/io/outgest.py Outdated Show resolved Hide resolved

apis/python/src/tiledbsoma/io/outgest.py Outdated Show resolved Hide resolved

PR f/b

46cf57d

johnkerl approved these changes Nov 17, 2024

View reviewed changes

johnkerl mentioned this pull request Nov 17, 2024

[python/c++] COO to CSX conversion optimization #3304

Merged

bkmartinjr added 4 commits November 17, 2024 21:01

fix compile warnings

52933f2

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

4ba7bee

PR f/b

ba3f69e

Merge branch 'bkmartinjr/fastercsx' into bkmartinjr/partition_to_anndata

646e021

Base automatically changed from bkmartinjr/fastercsx to main November 19, 2024 00:54

merge with main

080b47c

bkmartinjr merged commit 3808ed9 into main Nov 19, 2024
11 checks passed

bkmartinjr deleted the bkmartinjr/partition_to_anndata branch November 19, 2024 01:39

johnkerl mentioned this pull request Nov 19, 2024

[python] Ingest somacore classes #3307

Merged

bkmartinjr mentioned this pull request Nov 21, 2024

[python] Optimization of ExperimentAxisQuery to_anndata #3359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Partition sparse matrix reads in `tiledbsoma.io.to_anndata` #3328

[python] Partition sparse matrix reads in `tiledbsoma.io.to_anndata` #3328

bkmartinjr commented Nov 15, 2024 •

edited by johnkerl

Loading

johnkerl left a comment •

edited

Loading

[python] Partition sparse matrix reads in tiledbsoma.io.to_anndata #3328

[python] Partition sparse matrix reads in tiledbsoma.io.to_anndata #3328

Conversation

bkmartinjr commented Nov 15, 2024 • edited by johnkerl Loading

johnkerl left a comment • edited Loading

Choose a reason for hiding this comment

[python] Partition sparse matrix reads in `tiledbsoma.io.to_anndata` #3328

[python] Partition sparse matrix reads in `tiledbsoma.io.to_anndata` #3328

bkmartinjr commented Nov 15, 2024 •

edited by johnkerl

Loading

johnkerl left a comment •

edited

Loading