-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tracking] Add support for enumerated types aka categoricals aka factors #866
Labels
enhancement
New feature or request
Comments
johnkerl
changed the title
Add support for categoricals
Add support for categoricals (TileDB-Core tracker)
Feb 3, 2023
johnkerl
changed the title
Add support for categoricals (TileDB-Core tracker)
Add support for categoricals (TileDB-Core feature tracker)
Feb 3, 2023
This was referenced Feb 3, 2023
johnkerl
changed the title
Add support for categoricals (TileDB-Core feature tracker)
Add support for enumerated types AKA categoricals AKA factors
Jul 5, 2023
johnkerl
changed the title
Add support for enumerated types AKA categoricals AKA factors
[tracking] Add support for enumerated types AKA categoricals AKA factors
Jul 5, 2023
This was referenced Aug 14, 2023
This was referenced Sep 6, 2023
This was referenced Sep 13, 2023
Closed
Merged
ihnorton
pushed a commit
that referenced
this issue
Sep 15, 2023
As described in #1558 and #866, adding enumeration support is desirable once we have TileDB Embedded 2.17 available **Changes:** This PR supports reading of columns with enumerations (aka dictionaries aka factor variable) directly via Arrow. Preliminary write support is also available (but still goes through the `tiledb` R package for writes). **Notes for Reviewer:** ~This PR is now work-in-progress and not ready for a merge while we await TileDB 2.17.~ The branch and PR are ready but should only be merged once prequisites are been merged. It likely needs #1519 (C++ side) and #1663 (CI support). CI is turned off as the TileDB default build is still without support for enumerations.
johnkerl
added a commit
that referenced
this issue
Sep 15, 2023
* **Issue and/or context:** As described in #1558 and #866, adding enumeration support is desirable once we have TileDB Embedded 2.17 available **Changes:** This PR supports reading of columns with enumerations (aka dictionaries aka factor variable) directly via Arrow. Preliminary write support is also available (but still goes through the `tiledb` R package for writes). **Notes for Reviewer:** ~This PR is now work-in-progress and not ready for a merge while we await TileDB 2.17.~ The branch and PR are ready but should only be merged once prequisites are been merged. It likely needs #1519 (C++ side) and #1663 (CI support). CI is turned off as the TileDB default build is still without support for enumerations. * **Issue and/or context:** This PR adds support for return Arrow tables with dictionaries that can include `ordered` enumerations. **Changes:** Given #1559 which it depends upon, a very small change to just three files in `libtiledbsoma`. This should become clearer once the dependent PR is merged and can be rebased. **Notes for Reviewer:** [SC 34073](https://app.shortcut.com/tiledb-inc/story/34073/c-add-ordered-support-to-arrow-export) * **Issue and/or context:** This PR extends the `schema()` function to return an Arrow schema with enumerations including `ordered`. **Changes:** Given #1559 which it depends upon, a very small change to just one file. This should become clearer once the dependent PR is merged and can be rebased. **Notes for Reviewer:** [SC 34074](https://app.shortcut.com/tiledb-inc/story/34074/c-add-ordered-support-to-arrow-export) * [c++] Test fixes for #1559 (#1684) * ihn/bugfix * unit-test update * lint --------- Co-authored-by: John Kerl <[email protected]>
This was referenced Sep 17, 2023
Merged
Only remaining issue is #1710 which has its own tracking; closing this parent/tracker issue. |
This was referenced Sep 25, 2023
eddelbuettel
changed the title
[tracking] Add support for enumerated types AKA categoricals AKA factors
[tracking] Add support for enumerated types aka categoricals aka factors
Sep 25, 2023
This was referenced Sep 26, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Many systems such as AnnData, Pandas, Arrow, and the R language itself support categoricals.
red, yelllow, green
green, red, yellow
Status quo in TileDB-SOMA has been that these are "decategoricalized" or "flattened" to strings (or ints, etc.)
Evaluation plan:
Enumeration
in Python API #1511 -- as of 2023-07-05 not ready for evalThe text was updated successfully, but these errors were encountered: