Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Ingestion performance #2434

Merged
merged 16 commits into from
Apr 12, 2024
Merged

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Apr 11, 2024

Issue and/or context: #2433

Changes: As narrated on #2433. The only additional details is that to balance the DoesNotExistError for open, we need a completely analogous AlreadyExistsError for create.

Notes for Reviewer: There is a bit more performance improvement to be made, but it needs to be done in core; I will track that work separately. Also, as this is for Python; I'll track the R work -- both a needs-analysis, and any possible (maybe no) dev work -- separately.

Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Merging #2434 (746a2cd) into main (0fc3cc0) will decrease coverage by 10.69%.
Report is 1 commits behind head on main.
The diff coverage is 86.04%.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2434       +/-   ##
===========================================
- Coverage   90.61%   79.93%   -10.69%     
===========================================
  Files          37       92       +55     
  Lines        3900     8626     +4726     
===========================================
+ Hits         3534     6895     +3361     
- Misses        366     1731     +1365     
Flag Coverage Δ
python 90.53% <86.04%> (-0.09%) ⬇️
r 71.10% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 90.53% <86.04%> (-0.09%) ⬇️
libtiledbsoma ∅ <ø> (∅)

@johnkerl johnkerl changed the title [python] Ingestion performance [WIP] [python] Ingestion performance Apr 11, 2024
@johnkerl johnkerl marked this pull request as ready for review April 11, 2024 20:57
apis/python/src/tiledbsoma/_exception.py Outdated Show resolved Hide resolved
soma_arr = DenseNDArray.create(
arr_uri,
type=pa_dtype,
shape=value.shape,
platform_config=platform_config,
context=context,
)
except AlreadyExistsError:
soma_arr = _factory.open(arr_uri, "w", soma_type=DenseNDArray, context=context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually just directly call DenseNDArray.open here (and same with DataFrame.open above).

Copy link
Member

@ryan-williams ryan-williams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just one additional q re: a XXX comment

apis/python/src/tiledbsoma/_exception.py Outdated Show resolved Hide resolved
@johnkerl johnkerl marked this pull request as draft April 11, 2024 21:53
@johnkerl johnkerl marked this pull request as ready for review April 12, 2024 14:17
@johnkerl
Copy link
Member Author

@nguyenv @ryan-williams ready for re-review! :)

Copy link
Member

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks goods to me. Just have one minor suggestion.

apis/python/src/tiledbsoma/_exception.py Outdated Show resolved Hide resolved
Co-authored-by: nguyenv <[email protected]>
@johnkerl johnkerl merged commit 8261e6e into main Apr 12, 2024
23 checks passed
@johnkerl johnkerl deleted the kerl/ingest-performance-improvement branch April 12, 2024 16:24
github-actions bot pushed a commit that referenced this pull request Apr 12, 2024
* AlreadyExistsError; use for DataFrame in tiledbsoma.io

* apply in `_create_or_open_collection`

* apply in _create_from_matrix

* apply in _ingest_uns_ndarray

* lint

* Run SO copying workflow on macos-13 to avoid SIP (#2435)

* AlreadyExistsError; use for DataFrame in tiledbsoma.io

* apply in `_create_or_open_collection`

* apply in _create_from_matrix

* apply in _ingest_uns_ndarray

* lint

* neaten

* neaten

* Update raises-notes

* code-review feedback

Co-authored-by: nguyenv <[email protected]>

---------

Co-authored-by: John Blischak <[email protected]>
Co-authored-by: nguyenv <[email protected]>
johnkerl added a commit that referenced this pull request Apr 12, 2024
* AlreadyExistsError; use for DataFrame in tiledbsoma.io

* apply in `_create_or_open_collection`

* apply in _create_from_matrix

* apply in _ingest_uns_ndarray

* lint

* Run SO copying workflow on macos-13 to avoid SIP (#2435)

* AlreadyExistsError; use for DataFrame in tiledbsoma.io

* apply in `_create_or_open_collection`

* apply in _create_from_matrix

* apply in _ingest_uns_ndarray

* lint

* neaten

* neaten

* Update raises-notes

* code-review feedback



---------

Co-authored-by: John Kerl <[email protected]>
Co-authored-by: John Blischak <[email protected]>
Co-authored-by: nguyenv <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants