Skip to content

Commit

Permalink
[c++, r] Update and refactor nanoarrow (#2188)
Browse files Browse the repository at this point in the history
* Update nanoarrow vendored files to nanoarrow 0.4.0

* Low-level wiring of nanoarrow at sr_* level

* More lower-level wiring of nanoarrow

* WIP snapshot with nanoarrow wired into libtiledbsoma

* Ensure nullable is set correctly in either case

* Context wrapped in a special purpose struct should not finalize

* Simpler and faster r-ci.yaml

* Use nanoarrow 0.4.0 consistently

* Refined arrow_adapter

* Set increased timeout for download.file to survive GH flakyness

* Turn trace back of, do not include carrow in cli

* Do not include carrow.h in reindexer.cc

* WIP changes expanding type map, suppressing schema release

* [c++] Fix segfault issues

* Add additional necessary strdup

* No longer to protect one statement

* Support TILEDB_DATETIME_DAY aka Date as well

* Meh

* Meh with version 14.0.0 and not 14.0.6 because ... sure

* Remove initialization setters covered by nanoarrow use

* Ensure DATETIME columns get Arrow coltype reset

* Add more date and datetime support

* Additional conversion

* Post-rebase change

* Heeding time to the lord of linting is time well spent some say

* Heeding time to the lord of linting is time well spent some say

* Correct another delete to free

* Additional non-nullptr protection

* make format

* Additional test conditioner

* Correcting one buffer size selection

* make format

* Remove carrow.h and reference to it

* Cleanups

* Use nanoarrow.{c,hpp} via tiledbsoma/utils/

* Re-activate -Werror

* Chore

* High-productivity afternoon

* Correct an format string error message

---------

Co-authored-by: Vivian Nguyen <[email protected]>
  • Loading branch information
eddelbuettel and nguyenv authored Apr 3, 2024
1 parent 46f8699 commit aa0adcb
Show file tree
Hide file tree
Showing 27 changed files with 5,613 additions and 3,647 deletions.
8 changes: 0 additions & 8 deletions .github/workflows/r-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,6 @@ jobs:
# if: ${{ matrix.os != 'macOS-latest' }}
# run: cd apis/r && Rscript -e "options(bspm.version.check=TRUE); install.packages('tiledb', repos = c('https://eddelbuettel.r-universe.dev/bin/linux/jammy/4.3/', 'https://cloud.r-project.org'))"

- name: Install r-universe build of SeuratObject (macOS)
if: ${{ matrix.os == 'macOS-latest' }}
run: cd apis/r && Rscript -e "install.packages('SeuratObject', repos = c('https://mojaveazure.r-universe.dev', 'https://cloud.r-project.org'))"

- name: Install r-universe build of SeuratObject (linux)
if: ${{ matrix.os == 'ubuntu-latest' }}
run: cd apis/r && Rscript -e "options(bspm.version.check=TRUE); install.packages('SeuratObject', repos = c('https://mojaveazure.r-universe.dev/bin/linux/jammy/4.3/', 'https://cloud.r-project.org'))"

- name: Dependencies
run: cd apis/r && tools/r-ci.sh install_all

Expand Down
2 changes: 1 addition & 1 deletion apis/python/src/tiledbsoma/reindexer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
*/

#include <tiledbsoma/reindexer/reindexer.h>
#include <tiledbsoma/utils/carrow.h>
// #include <tiledbsoma/utils/carrow.h>
#include "common.h"

#define DENUM(x) .value(#x, TILEDB_##x)
Expand Down
6 changes: 4 additions & 2 deletions apis/r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,13 @@ Imports:
spdl,
rlang,
tools,
tibble
tibble,
nanoarrow
LinkingTo:
Rcpp,
RcppSpdlog,
RcppInt64
RcppInt64,
nanoarrow
Additional_repositories: https://ghrr.github.io/drat
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Expand Down
1 change: 1 addition & 0 deletions apis/r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ export(tiledbsoma_stats_show)
export(write_soma)
import(R6)
import(methods)
import(nanoarrow)
import(utils)
importFrom(Matrix,as.matrix)
importFrom(Matrix,sparseMatrix)
Expand Down
6 changes: 6 additions & 0 deletions apis/r/R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ sr_complete <- function(sr) {
.Call(`_tiledbsoma_sr_complete`, sr)
}

#' @noRd
#' @import nanoarrow
create_empty_arrow_table <- function() {
.Call(`_tiledbsoma_create_empty_arrow_table`)
}

sr_next <- function(sr) {
.Call(`_tiledbsoma_sr_next`, sr)
}
Expand Down
11 changes: 6 additions & 5 deletions apis/r/R/utils-arrow.R
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,14 @@ tiledb_type_from_arrow_type <- function(x, is_dim) {
utf8 = "UTF8",
string = "UTF8",
large_utf8 = "UTF8",
# date32 = "date32",
# based on what TileDB supports
date32 = "DATETIME_DAY",
# date64 = "date64",
# time32 = "time32",
# time64 = "time64",
# null = "null",
# timestamp = "timestamp",
# based on what TileDB supports with a default msec res.
timestamp = "DATETIME_MS",
# decimal128 = "decimal128",
# decimal256 = "decimal256",
# struct = "struct",
Expand Down Expand Up @@ -240,11 +242,10 @@ arrow_schema_from_tiledb_schema <- function(x) {
arrow::schema(c(dimfields, attfields))
}

#' Validate external pointer to ArrowArray
#' Validate external pointer to ArrowArray which is embedded in a nanoarrow S3 type
#' @noRd
check_arrow_pointers <- function(arrlst) {
stopifnot("First argument must be an external pointer to ArrowArray" = check_arrow_array_tag(arrlst[[1]]),
"Second argument must be an external pointer to ArrowSchema" = check_arrow_schema_tag(arrlst[[2]]))
stopifnot(inherits(arrlst, "nanoarrow_array"))
}

#' Validate compatibility of Arrow data types
Expand Down
7 changes: 3 additions & 4 deletions apis/r/R/utils-readerTransformers.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,13 @@
#'
#' @description Converts the results of a \link{soma_array_reader} or
#' \link{sr_next} to an arrow::\link[arrow]{Table}
#' @param x A List object with two pointers to Arrow array data and schema
#' @param x A nanoarrow_array object which is itself a wrapper around the external pointer
#' to the Arrow array data; the schema external pointer is added to it as well
#' @return arrow::\link[arrow]{Table}
#' @noRd
soma_array_to_arrow_table <- function(x) {
check_arrow_pointers(x)
arrow::as_arrow_table(
arrow::RecordBatch$import_from_c(x$array_data, x$schema)
)
arrow::as_arrow_table(x)
}

#' Transformer function: Arrow table to Matrix::sparseMatrix
Expand Down
2 changes: 1 addition & 1 deletion apis/r/inst/include/tiledbsoma_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
#define TILEDB_NO_API_DEPRECATION_WARNINGS
#endif

#include <nanoarrow.h> // for C interface to Arrow
#include <tiledbsoma/utils/nanoarrow.h> // for C interface to Arrow
#include <tiledb/tiledb> // for QueryCondition etc
#define ARROW_SCHEMA_AND_ARRAY_DEFINED 1
#include <tiledbsoma/tiledbsoma>
Expand Down
15 changes: 13 additions & 2 deletions apis/r/src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

// soma_array_reader
Rcpp::List soma_array_reader(const std::string& uri, Rcpp::Nullable<Rcpp::CharacterVector> colnames, Rcpp::Nullable<Rcpp::XPtr<tiledb::QueryCondition>> qc, Rcpp::Nullable<Rcpp::List> dim_points, Rcpp::Nullable<Rcpp::List> dim_ranges, std::string batch_size, std::string result_order, const std::string& loglevel, Rcpp::Nullable<Rcpp::CharacterVector> config);
SEXP soma_array_reader(const std::string& uri, Rcpp::Nullable<Rcpp::CharacterVector> colnames, Rcpp::Nullable<Rcpp::XPtr<tiledb::QueryCondition>> qc, Rcpp::Nullable<Rcpp::List> dim_points, Rcpp::Nullable<Rcpp::List> dim_ranges, std::string batch_size, std::string result_order, const std::string& loglevel, Rcpp::Nullable<Rcpp::CharacterVector> config);
RcppExport SEXP _tiledbsoma_soma_array_reader(SEXP uriSEXP, SEXP colnamesSEXP, SEXP qcSEXP, SEXP dim_pointsSEXP, SEXP dim_rangesSEXP, SEXP batch_sizeSEXP, SEXP result_orderSEXP, SEXP loglevelSEXP, SEXP configSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Expand Down Expand Up @@ -129,8 +129,18 @@ BEGIN_RCPP
return rcpp_result_gen;
END_RCPP
}
// create_empty_arrow_table
SEXP create_empty_arrow_table();
RcppExport SEXP _tiledbsoma_create_empty_arrow_table() {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::RNGScope rcpp_rngScope_gen;
rcpp_result_gen = Rcpp::wrap(create_empty_arrow_table());
return rcpp_result_gen;
END_RCPP
}
// sr_next
Rcpp::List sr_next(Rcpp::XPtr<tdbs::SOMAArray> sr);
SEXP sr_next(Rcpp::XPtr<tdbs::SOMAArray> sr);
RcppExport SEXP _tiledbsoma_sr_next(SEXP srSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Expand Down Expand Up @@ -220,6 +230,7 @@ static const R_CallMethodDef CallEntries[] = {
{"_tiledbsoma_shape", (DL_FUNC) &_tiledbsoma_shape, 2},
{"_tiledbsoma_sr_setup", (DL_FUNC) &_tiledbsoma_sr_setup, 10},
{"_tiledbsoma_sr_complete", (DL_FUNC) &_tiledbsoma_sr_complete, 1},
{"_tiledbsoma_create_empty_arrow_table", (DL_FUNC) &_tiledbsoma_create_empty_arrow_table, 0},
{"_tiledbsoma_sr_next", (DL_FUNC) &_tiledbsoma_sr_next, 1},
{"_tiledbsoma_tiledbsoma_stats_enable", (DL_FUNC) &_tiledbsoma_tiledbsoma_stats_enable, 0},
{"_tiledbsoma_tiledbsoma_stats_disable", (DL_FUNC) &_tiledbsoma_tiledbsoma_stats_disable, 0},
Expand Down
Loading

0 comments on commit aa0adcb

Please sign in to comment.