-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for enumerated types #562
Conversation
This pull request has been linked to Shortcut Story #30201: [R] Support for enumerated types. |
[sc-30316] |
This pull request has been linked to Shortcut Story #30316: Enumerated data types AKA categoricals AKA factors. |
d3dcc9f
to
f9e8f5b
Compare
c9c1559
to
4812e36
Compare
02fe7fa
to
ffee01e
Compare
I have taken the 'draft' status off as this is now fairly featureful but we still have to wait for the TileDB Embedded 2.17.0 release to have enumeration support in the core library so that it can be used here -- for now the tests are all skipped in CI as we are only to to 2.16.1 which does not included enumeration support. A quick demo with enumeration support including query conditions on enum and non-enum columns: > library(tiledb)
TileDB R 0.20.1.4 with TileDB Embedded 2.17.0 on Ubuntu 23.04.
See https://tiledb.com for more information about TileDB.
> uri <- "mem://penguins"
> fromDataFrame(palmerpenguins::penguins, uri)
> arr <- tiledb_array(uri, extended=FALSE, return_as="data.table")
> query_condition(arr) <- parse_query_condition(year == 2009 && sex == male && species == Gentoo, ta = arr)
> arr[]
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
<fctr> <fctr> <num> <num> <int> <int> <fctr> <int>
1: Gentoo Biscoe 52.5 15.6 221 5450 male 2009
2: Gentoo Biscoe 50.0 15.9 224 5350 male 2009
3: Gentoo Biscoe 50.8 17.3 228 5600 male 2009
4: Gentoo Biscoe 51.3 14.2 218 5300 male 2009
5: Gentoo Biscoe 52.1 17.0 230 5550 male 2009
6: Gentoo Biscoe 52.2 17.1 228 5400 male 2009
7: Gentoo Biscoe 49.5 16.1 224 5650 male 2009
8: Gentoo Biscoe 50.8 15.7 226 5200 male 2009
9: Gentoo Biscoe 49.4 15.8 216 4925 male 2009
10: Gentoo Biscoe 51.1 16.5 225 5250 male 2009
11: Gentoo Biscoe 55.9 17.0 228 5600 male 2009
12: Gentoo Biscoe 49.1 15.0 228 5500 male 2009
13: Gentoo Biscoe 46.8 16.1 215 5500 male 2009
14: Gentoo Biscoe 53.4 15.8 219 5500 male 2009
15: Gentoo Biscoe 48.1 15.1 209 5500 male 2009
16: Gentoo Biscoe 49.8 15.9 229 5950 male 2009
17: Gentoo Biscoe 51.5 16.3 230 5500 male 2009
18: Gentoo Biscoe 55.1 16.0 230 5850 male 2009
19: Gentoo Biscoe 48.8 16.2 222 6000 male 2009
20: Gentoo Biscoe 50.4 15.7 222 5750 male 2009
21: Gentoo Biscoe 49.9 16.1 213 5400 male 2009
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
> |
ffee01e
to
da6323f
Compare
(plus minor cleanup following rebase)
da6323f
to
2496e40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddelbuettel do we need to bump to 0.20.3.(n+1) in DESCRIPTION
?
Six of one ... but may as well. It is marked as 'bigger than 0.20.3.1' which was this morning's status quo. So .2 works for me (signifying 2.17.0-rc0). Can make it .3 if that makes you happier but a >= .2 should already do. Will update NEWS.md as well. |
[WIP]This(work-in-progress)branch supports enumerated types (as provided by the merged-into-dev PR 4051). It is now round-turn complete for the standard case of a data.frame in and out, support for Arrow return is next.CI is now on but 'ineffective' for the new code as there is no pre-made artifact to utilise, it is tested locally in development; see below for full run.
#866
This PR has now been rebased on the central branch with its dependency on the first RC release of TileDB Core 2.17.0.