-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Arrow utils with current-domain option #2911
Conversation
8ffc6f8
to
ad3b5d1
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2911 +/- ##
==========================================
+ Coverage 89.87% 90.02% +0.15%
==========================================
Files 38 38
Lines 3999 3999
==========================================
+ Hits 3594 3600 +6
+ Misses 405 399 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
A more general comment orthogonal to this PR: we really should think about making more use of nanoarrow (as I have) to remove most if not all of this arrow hand-holding boilerplate code (ie the bits that have to set each element of the struct explicitly). |
Thanks @eddelbuettel . That's a good idea for another PR. At the moment I want to separate refactors / dependency changes from the careful sequencing of steps necesary for confident rollout of the current-domain-backed new-shape feature. Introducing nanoarrow here would be too many variables: a dependent-package change, along with the current-domain support. As follow-on -- separately -- yes, I am behind this. I'll file an issue to track this great idea. |
655210f
to
fb3e634
Compare
def9082
to
5f55777
Compare
86d4f5b
to
db0348b
Compare
db0348b
to
3441d9e
Compare
a51f6bd
to
daecd1b
Compare
libtiledbsoma/src/soma/soma_array.cc
Outdated
// objects for which soma_joinid is not a dim at all | ||
// * These cases are all actively unit-tested within apis/python/tests | ||
if (dim.type() != TILEDB_INT64) { | ||
throw TileDBSOMAError("Found unexpected non-int64 dimension type."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know where you are coming from here, and I slept over it one night, but I am not sure I am fully on-board.
We have the factory methods. We have the SOMA definitions. It is clear what a well-formed SOMA object is.
But I still cannot shake the feeling that this seems "wrong" in the sense of working, testing, testable, useful, ... general code being removed.
I'll sleep over it another night. Not objecting to the change, but scratching my chin. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed code is dead ... unit tests would be throwing if it were not. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not really my point / I understand that. But such a converter is also useful outside of SOMA, the test is fine, and I have tested and developed too with non-soma dataframes. That can be useful.
Again, not vetoing anything here, just saying that I simultaneously understand why you propose what you and that I likely would not do it. No more, no less, and surely no veto.
(And as an aside, "we can remove because doing so does not break tests" is not really a viable rule because we all know that coverage metrics are never perfect and there may well be some code that potentially calls this (this is open source after all). Anyway all good by me either way ...)
41c6d96
to
1db05d7
Compare
Co-authored-by: nguyenv <[email protected]>
b58dba1
to
ee61b8e
Compare
c6f7f73
to
3517a1e
Compare
Issue and/or context: As tracked on issue #2407 / [sc-51048].
The intended Python and R API changes are all agreed on and finalized as described in #2407.
Changes:
create_arrow_schema
andcreate_index_column_info
take flags for status quo (use_current_domain=false
) as well as new-shape (use_current_domain=true
-- [python/r/c++] Revisitshape
for component arrays #2407) . These flags will allow for a reasoned, well-tested transition to new-shape functionality at the R and Python levels, implemented on upcoming PRs.libtiledbsoma
level will validate correctness of the libtiledbsoma-to-core contracts, thereby increasing confidence of the correctness of the libtiledbsoma-level implementation.resize
logic tolibtiledbsoma
, and unit-test its correctness. However, this PR is already more than big enough as-is.Notes for Reviewer:
As of 2024-08-18, this is stacked on top of #2910. #2910 should be approved and merged first, then, this should be rebased on the new
main
. (What should not happen is for this PR to be merged into #2910.)As of 2024-08-18 this incorporates #2915 without being stacked on top of it. Here, too, #2915 should be approved and merged first, then, this should be rebased on the new
main
.