Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional nanoarrow refactoring #682

Merged
merged 6 commits into from
Mar 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 16 additions & 19 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,30 +1,27 @@
Package: tiledb
Type: Package
Version: 0.25.0.3
Title: Modern Database Engine for Multi-Modal Data via Sparse and Dense Multidimensional Arrays
Version: 0.25.0.4
Title: Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
Authors@R: c(person("TileDB, Inc.", role = c("aut", "cph")),
person("Dirk", "Eddelbuettel", email = "[email protected]", role = "cre"))
person("Dirk", "Eddelbuettel", email = "[email protected]", role = "cre"))
Description: The modern database 'TileDB' introduces a powerful on-disk
format for multi-modal data based on dimensional arrays. It supports
dense and sparse arrays, dataframes and key-values stores, cloud
storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression,
encryption and checksum filters, uses a fully multi-threaded
implementation, supports parallel I/O, data versioning ('time
travel'), metadata and groups. It is implemented as an embeddable
cross-platform C++ library with APIs from several languages, and
integrations.
format for storing and accessing any complex data based on multi-dimensional
arrays. It supports dense and sparse arrays, dataframes and key-values stores,
cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression,
encryption and checksum filters, uses a fully multi-threaded implementation,
supports parallel I/O, data versioning ('time travel'), metadata and groups.
It is implemented as an embeddable cross-platform C++ library with APIs from
several languages, and integrations. This package provides the R support.
Copyright: TileDB, Inc.
License: MIT + file LICENSE
URL: https://github.com/TileDB-Inc/TileDB-R, https://tiledb-inc.github.io/TileDB-R/
BugReports: https://github.com/TileDB-Inc/TileDB-R/issues
SystemRequirements: A C++17 compiler is required, and for macOS
compilation version 11.0 or later is required. Optionally cmake (only
when TileDB source build selected), curl (only when TileDB source
build selected)), and git (only when TileDB source build selected);
on x86_64 and M1 platforms pre-built TileDB Embedded libraries are
available at GitHub and are used if no TileDB installation is
detected, and no other option to build or download was specified by
the user.
SystemRequirements: A C++17 compiler is required; on macOS compilation version 11.0
or later is required. Optionally cmake (only when TileDB source build selected),
curl (only when TileDB source build selected)), and git (only when TileDB source
build selected); on x86_64 and M1 platforms pre-built TileDB Embedded libraries
are available at GitHub and are used if no TileDB installation is detected, and
no other option to build or download was specified by the user.
Imports: methods, Rcpp (>= 1.0.8), nanotime, spdl, nanoarrow
LinkingTo: Rcpp, RcppInt64, nanoarrow
Suggests: tinytest, simplermarkdown, curl, bit64, Matrix, palmerpenguins, nycflights13, data.table, tibble, arrow
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

* The display of a `filter_list` not labels is correctly as a filter list (@cgiachalis in #681)

* The Arrow integration has been simplified using [nanoarrow](https://github.com/apache/arrow-nanoarrow) returning a single `nanoarrow` object; an unexported helper function `nanoarrow2list()` is provided to matching the previous interface (#682)

## Build and Test Systems

* The `configure` and `Makevars.in` received a minor update correcting small issues (#680)
Expand Down
23 changes: 12 additions & 11 deletions R/ArrowIO.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@
##' @param query A TileDB Query object
##' @param name A character variable identifying the buffer
##' @param ctx tiledb_ctx object (optional)
##' @return A two-element vector where the two elements are
##' external pointers to the Arrow array and schema
##' @return A \code{nanoarrow} object (which is an external pointer to an Arrow Array
##' with the Arrow Schema stored as the external pointer tag) classed as an S3 object
##' @export
tiledb_query_export_buffer <- function(query, name, ctx = tiledb_get_context()) {
stopifnot(`The 'query' argument must be a tiledb query` = is(query, "tiledb_query"),
`The 'name' argument must be character` = is.character(name))
stopifnot("The 'query' argument must be a tiledb query" = is(query, "tiledb_query"),
"The 'name' argument must be character" = is.character(name))
res <- libtiledb_query_export_buffer(ctx@ptr, query@ptr, name)
res
}
Expand All @@ -43,16 +43,17 @@ tiledb_query_export_buffer <- function(query, name, ctx = tiledb_get_context())
##' from two Arrow exerternal pointers.
##' @param query A TileDB Query object
##' @param name A character variable identifying the buffer
##' @param arrowpointers A two-element list vector with two external pointers
##' to an Arrow Array and Schema, respectively
##' @param nanoarrowptr A \code{nanoarrow} object (which is an external pointer to an Arrow Array
##' with the Arrow Schema stored as the external pointer tag) classed as an S3 object
##' @param ctx tiledb_ctx object (optional)
##' @return The update Query external pointer is returned
##' @export
tiledb_query_import_buffer <- function(query, name, arrowpointers, ctx = tiledb_get_context()) {
stopifnot(`The 'query' argument must be a tiledb query` = is(query, "tiledb_query"),
`The 'name' argument must be character` = is.character(name),
`The 'arrowpointers' argument must be list of length two` = is.list(arrowpointers) && length(arrowpointers)==2)
query@ptr <- libtiledb_query_import_buffer(ctx@ptr, query@ptr, name, arrowpointers)
tiledb_query_import_buffer <- function(query, name, nanoarrowptr, ctx = tiledb_get_context()) {
stopifnot("The 'query' argument must be a tiledb query" = is(query, "tiledb_query"),
"The 'name' argument must be character" = is.character(name),
"The 'nanoarrowptr' argument must be an 'nanoarrow' array object" =
inherits(nanoarrowptr, "nanoarrow_array"))
query@ptr <- libtiledb_query_import_buffer(ctx@ptr, query@ptr, name, nanoarrowptr)
query
}

Expand Down
34 changes: 6 additions & 28 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
@@ -1,44 +1,18 @@
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

.allocate_arrow_array_as_xptr <- function() {
.Call(`_tiledb_allocate_arrow_array_as_xptr`)
}

.allocate_arrow_schema_as_xptr <- function() {
.Call(`_tiledb_allocate_arrow_schema_as_xptr`)
}

.delete_arrow_array_from_xptr <- function(sxp) {
invisible(.Call(`_tiledb_delete_arrow_array_from_xptr`, sxp))
}

.delete_arrow_schema_from_xptr <- function(sxp) {
invisible(.Call(`_tiledb_delete_arrow_schema_from_xptr`, sxp))
}

libtiledb_query_export_buffer <- function(ctx, query, name) {
.Call(`_tiledb_libtiledb_query_export_buffer`, ctx, query, name)
}

libtiledb_query_import_buffer <- function(ctx, query, name, arrowpointers) {
.Call(`_tiledb_libtiledb_query_import_buffer`, ctx, query, name, arrowpointers)
libtiledb_query_import_buffer <- function(ctx, query, name, naptr) {
.Call(`_tiledb_libtiledb_query_import_buffer`, ctx, query, name, naptr)
}

libtiledb_query_export_arrow_table <- function(ctx, query, names) {
.Call(`_tiledb_libtiledb_query_export_arrow_table`, ctx, query, names)
}

#' @noRd
check_arrow_schema_tag <- function(xp) {
.Call(`_tiledb_check_arrow_schema_tag`, xp)
}

#' @noRd
check_arrow_array_tag <- function(xp) {
.Call(`_tiledb_check_arrow_array_tag`, xp)
}

libtiledb_to_arrow <- function(ab, qry, dicts) {
.Call(`_tiledb_libtiledb_to_arrow`, ab, qry, dicts)
}
Expand All @@ -47,6 +21,10 @@ libtiledb_allocate_column_buffers <- function(ctx, qry, uri, names, memory_budge
.Call(`_tiledb_libtiledb_allocate_column_buffers`, ctx, qry, uri, names, memory_budget)
}

nanoarrow2list <- function(naarrptr) {
.Call(`_tiledb_nanoarrow2list`, naarrptr)
}

makeQueryWrapper <- function(qp) {
.Call(`_tiledb_makeQueryWrapper`, qp)
}
Expand Down
4 changes: 2 additions & 2 deletions inst/include/tiledb.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ typedef struct query_buffer query_buf_t;
// map from buffer names to shared_ptr to column_buffer
typedef std::unordered_map<std::string, std::shared_ptr<tiledb::ColumnBuffer>> map_to_col_buf_t;

// some lipstick on the pig that is a SEXP -- allow the nanoarrow ArrowArray XPtr be typedef'ed
typedef SEXP nanoarrowXPtr;
// some lipstick on the pig that is a SEXP -- but we stick with the S3 SEXP nanoarrow creates
typedef SEXP nanoarrowS3;

// C++ compiler complains about missing delete functionality when we use tiledb_vfs_fh_t directly
struct vfs_fh {
Expand Down
54 changes: 30 additions & 24 deletions inst/tinytest/test_arrowio.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,16 @@ batch <- record_batch(df)
expect_true(is(batch, "RecordBatch"))
expect_true(is(as.data.frame(batch), "data.frame"))


## allocate two structures (and release at end)
aa <- tiledb_arrow_array_ptr()
as <- tiledb_arrow_schema_ptr()
aa <- nanoarrow::nanoarrow_allocate_array()
as <- nanoarrow::nanoarrow_allocate_schema()
arrow:::ExportRecordBatch(batch, aa, as)

newrb <- arrow:::ImportRecordBatch(aa, as)
expect_true(is(newrb, "RecordBatch"))
expect_true(is(as.data.frame(newrb), "data.frame"))
expect_equal(batch, newrb)

tiledb_arrow_schema_del(as)
tiledb_arrow_array_del(aa)


## round-turn test 1: write tiledb first, create arrow object via zero-copy
suppressMessages(library(bit64))
Expand Down Expand Up @@ -74,7 +70,6 @@ tiledb_query_finalize(qry)
#arr <- tiledb_array(tmp, return_as="data.frame")
#print(arr[])


arr <- tiledb_array(tmp)
qry <- tiledb_query(arr, "READ")
dimptr <- tiledb_query_buffer_alloc_ptr(qry, "INT32", n)
Expand All @@ -90,18 +85,20 @@ tiledb_query_submit(qry)
tiledb_query_finalize(qry)

res <- tiledb_query_export_buffer(qry, "rows")
v <- Array$create(arrow:::ImportArray(res[[1]], res[[2]]))
tiledb_arrow_array_del(res[[1]])
tiledb_arrow_schema_del(res[[2]])
#v <- Array$create(arrow:::ImportArray(res[[1]], res[[2]]))
v <- Array$create(res)
#tiledb_arrow_array_del(res[[1]])
#tiledb_arrow_schema_del(res[[2]])

expect_equal(v$as_vector(), 4:7)

for (col in c("int8", "uint8", "int16", "uint16", "int32", "uint32", "int64", "uint64", "float64")) {
qry <- tiledb_query_set_buffer_ptr(qry, col, attrlst[[col]])
res <- tiledb_query_export_buffer(qry, col)
v <- Array$create(arrow:::ImportArray(res[[1]], res[[2]]))
tiledb_arrow_array_del(res[[1]])
tiledb_arrow_schema_del(res[[2]])
v <- Array$create(res)
#v <- Array$create(arrow:::ImportArray(res[[1]], res[[2]]))
#tiledb_arrow_array_del(res[[1]])
#tiledb_arrow_schema_del(res[[2]])

expect_equal(v$as_vector(), 4:7)
}
Expand All @@ -112,6 +109,8 @@ dir.create(tmp <- tempfile())
n <- 10L

## create a schema but don't fill it yet
#spdl::log("debug")

dim <- tiledb_dim("rows", domain=c(1L,n), type="INT32", tile=1L)
dom <- tiledb_domain(dim)
sch <- tiledb_array_schema(dom,
Expand All @@ -127,6 +126,7 @@ sch <- tiledb_array_schema(dom,
sparse = TRUE)
tiledb_array_create(tmp, sch)

#exit_file("aa")
## create an arrow 'record batch' with a number of (correcsponding) columns
rb <- record_batch("rows" = Array$create(1:n, int32()),
"int8" = Array$create(1:n, int8()),
Expand All @@ -144,30 +144,36 @@ rb <- record_batch("rows" = Array$create(1:n, int32()),
arr <- tiledb_array(tmp)
qry <- tiledb_query(arr, "WRITE")

#spdl::log("debug")
nms <- rb$names()
lst <- list()
for (nam in nms) {
vec <- rb[[nam]] # can access by name
aa <- tiledb_arrow_array_ptr()
as <- tiledb_arrow_schema_ptr()
arrow:::ExportArray(vec, aa, as)
na <- nanoarrow::as_nanoarrow_array(vec)
#print(na)
#print(class(na))
#aa <- tiledb_arrow_array_ptr()
#as <- tiledb_arrow_schema_ptr()
#arrow:::ExportArray(vec, aa, as)

qry <- tiledb_query_import_buffer(qry, nam, list(aa, as))
qry <- tiledb_query_import_buffer(qry, nam, na)

lst[[nam]] <- list(aa=aa, as=as)
#lst[[nam]] <- list(aa=aa, as=as)
}
tiledb_query_set_layout(qry, "UNORDERED")
tiledb_query_submit(qry)
tiledb_query_finalize(qry)

arr <- tiledb_array(tmp, return_as="data.frame")
df <- arr[]

for (i in 1:10) {
l <- lst[[i]]
tiledb_arrow_array_del(l[[1]])
tiledb_arrow_schema_del(l[[2]])
}
#print(df)
#q()

#for (i in 1:10) {
# l <- lst[[i]]
# tiledb_arrow_array_del(l[[1]])
# tiledb_arrow_schema_del(l[[2]])
#}

## n=15
expect_true(is(df, "data.frame"))
Expand Down
Loading