-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Apply ArraySchemaEvolution
to specified timestamp
#2895
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2895 +/- ##
==========================================
+ Coverage 89.83% 89.98% +0.15%
==========================================
Files 37 37
Lines 3926 3926
==========================================
+ Hits 3527 3533 +6
+ Misses 399 393 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Good. That should also help for R where we documented this first, |
There is something going on, since
should have been
since we wrote
|
I've verified that ☝️ is a separate bug. Running this on a built checkout without this PR:
I get
|
@eddelbuettel great! If you have a ilnk for that "where" handy, please share it here -- that will be helpful. |
ArraySchemaEvolution
to specified timestamp [WIP]ArraySchemaEvolution
to specified timestamp
Never mind -- I found #2879 |
@johnkerl Here is my actual test script from last week (which is not yet committed to the 'scraps' repo I use for these). With the SOMA version I have here (a few days old) it just comes up empty (no 'strings' for the enumeration). Worth testing on your patch. !#!/usr/bin/env Rscript
uri <- "/tmp/tiledb/somadf-evolve-ts"
if (dir.exists(uri)) unlink(uri, recursive=TRUE)
schema <- arrow::schema(arrow::field("soma_joinid", arrow::int64(), nullable = FALSE),
arrow::field("int", arrow::int32(), nullable = FALSE),
arrow::field("str", arrow::dictionary(index_type = arrow::int8(), value_type = arrow::utf8(), ordered = FALSE), nullable=FALSE))
print(schema)
ts1 <- rep(as.POSIXct(1, tz="UTC"), 2)
sdf <- tiledbsoma::SOMADataFrameCreate(uri, schema, tiledb_timestamp=ts1[1])
data <- arrow::arrow_table(soma_joinid = bit64::as.integer64(1L:5L),
int = 101:105L,
str = factor(c('a','b','b','a','b')))
cat("\n")
print(tibble::as_tibble(data))
ts2 <- rep(as.POSIXct(2, tz="UTC"), 2)
sdf$write(data, ts2)
data <- arrow::arrow_table(soma_joinid = bit64::as.integer64(6L:10L),
int = 106:110L,
str = factor(c('c','b','c','c','b')))
cat("\n")
print(tibble::as_tibble(data))
ts3 <- rep(as.POSIXct(3, tz="UTC"), 2)
sdf$write(data, ts3)
sdf$close()
cat("\n")
tiledb::tiledb_array(uri, return_as="data.table")[]
sdf <- tiledbsoma::SOMADataFrameOpen(uri, tiledb_timestamp=ts3)
res <- sdf$read()$concat()
print(res)
print(res$str)
#tibble::as_tibble(res)
|
@eddelbuettel before & after this PR I get the following when I run that:
This PR should be merged to remove a known defect from the C++ implementation and we can continue from there. |
* [c++] Apply `ArraySchemaEvolution` to specified timestamp * Add Python unit-test coverage * run `make format`
) * [c++] Apply `ArraySchemaEvolution` to specified timestamp * Add Python unit-test coverage * run `make format` Co-authored-by: John Kerl <[email protected]>
@johnkerl when you said above
where you on
But that is on the de/sc-52516/timestamprange branch. |
Good to know -- awesome @eddelbuettel ! :) |
Branch should hopefully be ready "Real Soon Now (TM)" too. |
)" This reverts commit 7241fef. It seems to break cellxgene_census_builder/tests/test_builder.py.
Issue and/or context: #2879
Changes: If we've got the array open at a timestamp, and we need to evolve it, evolve from that same timestamp.
Notes for Reviewer:
Material from @nguyenv to be incorporated into unit-test material:
Script:
Before:
After:
Difference: