Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Support Enumeration in Python API #1511

Merged

Conversation

nguyenv
Copy link
Member

@nguyenv nguyenv commented Jun 28, 2023

Issue and/or context: #866

Changes:

Reads are done in C++. Writes are done through TileDB-Py (TileDB-Inc/TileDB-Py#1790).

Notes for Reviewer:

Depends on #1519.

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #30316: Enumerated data types AKA categoricals AKA factors.

@nguyenv nguyenv changed the title [c++/python][wip] Support Enumerations in Python (writes), C++ (reads) [c++/python][wip] Support Enumerations in Python API Jun 28, 2023
@nguyenv nguyenv changed the title [c++/python][wip] Support Enumerations in Python API [c++/python][wip] Support Enumeration in Python API Jun 28, 2023
@johnkerl
Copy link
Member

[sc-30316]

@nguyenv
Copy link
Member Author

nguyenv commented Jul 7, 2023

This PR is now at a point where it can be evaluated for feedback.

@nguyenv nguyenv changed the title [c++/python][wip] Support Enumeration in Python API [python] Support Enumeration in Python API Jul 19, 2023
@nguyenv nguyenv force-pushed the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch from 1ad3e2c to 2fbc777 Compare August 17, 2023 19:14
@nguyenv nguyenv force-pushed the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch from c2c81de to a84e76c Compare August 24, 2023 02:44
@johnkerl
Copy link
Member

johnkerl commented Sep 6, 2023

@nguyenv can you please resolve conflicts on this PR to bring it up to date? I'm happy to help if you need.

@nguyenv
Copy link
Member Author

nguyenv commented Sep 6, 2023

@nguyenv can you please resolve conflicts on this PR to bring it up to date? I'm happy to help if you need.

Yes I've been looking at this since yesterday . I'm seeing failures in the most recently added unit tests that I haven't resolved yet.

===================================================================================================== short test summary info ======================================================================================================
FAILED apis/python/tests/test_basic_anndata_io.py::test_obs_with_categorical_int_nan_enumeration - tiledb.cc.TileDBError: URI /home/vivian/TileDB-SOMA/apis/python/testdata/categorical_int_nan.h5ad is not a valid file
FAILED apis/python/tests/test_registration_mappings.py::test_multiples_without_experiment[True-permutation0] - IndexError: Chunk index out of range.
FAILED apis/python/tests/test_registration_mappings.py::test_multiples_without_experiment[True-permutation1] - IndexError: Chunk index out of range.
FAILED apis/python/tests/test_registration_mappings.py::test_multiples_without_experiment[True-permutation2] - IndexError: Chunk index out of range.
=============================================================================== 14 failed, 1358 passed, 3 xfailed, 194 warnings in 653.45s (0:10:53) ===============================================================================
make: *** [Makefile:35: test] Error 1

@johnkerl
Copy link
Member

johnkerl commented Sep 6, 2023

@nguyenv those are tests I created -- I'll help take a look

@nguyenv nguyenv force-pushed the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch from 6e22be8 to 1058f9d Compare September 7, 2023 19:52
@nguyenv nguyenv force-pushed the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch from ba4559c to 0687b16 Compare September 11, 2023 18:00
@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2023

Codecov Report

Patch has no changes to coverable lines.

❗ Current head eeffaea differs from pull request most recent head ae1474d. Consider uploading reports for the commit ae1474d to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

📢 Thoughts on this report? Let us know!.

@nguyenv nguyenv marked this pull request as ready for review September 14, 2023 20:59
* Addition of `SOMAArray::get_enum` and `SOMAArray::get_enum_label_on_attr`
* Attach an enumeration/dictionary to the `ColumnBuffer` if applicable;
  this is used when converting from `ArrayBuffers` to Arrow Tables in
  the Python and R APIs
* Prior to TileDB-Inc/TileDB#4272, the SOMA unit
tests were erroneously writing a byte vector for string dimensions which
maps to `TILEDB_BLOB` rather than `TILEDB_STRING_ASCII`
nguyenv and others added 23 commits September 14, 2023 17:06
* Prior to TileDB-Inc/TileDB#4272, the SOMA unit
tests were erroneously writing a byte vector for string dimensions which
maps to `TILEDB_BLOB` rather than `TILEDB_STRING_ASCII`
…ss (#1650)

* temp

* robustness

* extract method for obsm/varm outgest

* complete rebase to main

* more unit-test cases

* remove R debugs

* robustness

* complete rebase to main

* [python] Leverage bounding-box feature for obsm/varm outgest robustness

* test data for holey obsm

* unit-test cases

* on-line help improvements
* [python] Improve schema-printer [RFC]

* neaten

* code-neaten inspired by dirk's 1675
@nguyenv nguyenv force-pushed the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch from 14f4387 to f697900 Compare September 14, 2023 22:36
@johnkerl
Copy link
Member

johnkerl commented Sep 15, 2023

This has the same interop fail as #1519 which I am debugging separately: #1679. Otherwise looks good.

@johnkerl johnkerl merged commit bd13a68 into main Sep 15, 2023
@johnkerl johnkerl deleted the viviannguyen/sc-30316/enumerated-data-types-aka-categoricals-aka branch September 15, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants