Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] Arrow utils with current-domain option #2911

Merged
merged 3 commits into from
Aug 27, 2024

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Aug 17, 2024

Issue and/or context: As tracked on issue #2407 / [sc-51048].
The intended Python and R API changes are all agreed on and finalized as described in #2407.

Changes:

  • create_arrow_schema and create_index_column_info take flags for status quo (use_current_domain=false) as well as new-shape (use_current_domain=true -- [python/r/c++] Revisit shape for component arrays #2407) . These flags will allow for a reasoned, well-tested transition to new-shape functionality at the R and Python levels, implemented on upcoming PRs.
  • For knowledge management, I'll introduce some comment blocks about how these two methods work. They're an ingenious technique for sharing schema information in a language-neutral way between R/C++ and Python/C++.
  • Unit-test support on this PR is mainly confined to demonstrating that no backward compatibility is broken: the Python and R APIs continue to work as they have before.
  • Unit-test cases at the libtiledbsoma level will validate correctness of the libtiledbsoma-to-core contracts, thereby increasing confidence of the correctness of the libtiledbsoma-level implementation.
  • Further unit-test expansion on subsequent PRs will introduce the resize logic to libtiledbsoma, and unit-test its correctness. However, this PR is already more than big enough as-is.

Notes for Reviewer:

As of 2024-08-18, this is stacked on top of #2910. #2910 should be approved and merged first, then, this should be rebased on the new main. (What should not happen is for this PR to be merged into #2910.)

As of 2024-08-18 this incorporates #2915 without being stacked on top of it. Here, too, #2915 should be approved and merged first, then, this should be rebased on the new main.

@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch from 8ffc6f8 to ad3b5d1 Compare August 17, 2024 16:42
Copy link

codecov bot commented Aug 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.02%. Comparing base (08193f2) to head (3517a1e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2911      +/-   ##
==========================================
+ Coverage   89.87%   90.02%   +0.15%     
==========================================
  Files          38       38              
  Lines        3999     3999              
==========================================
+ Hits         3594     3600       +6     
+ Misses        405      399       -6     
Flag Coverage Δ
python 90.02% <ø> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 90.02% <ø> (+0.15%) ⬆️
libtiledbsoma ∅ <ø> (∅)

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Aug 17, 2024

A more general comment orthogonal to this PR: we really should think about making more use of nanoarrow (as I have) to remove most if not all of this arrow hand-holding boilerplate code (ie the bits that have to set each element of the struct explicitly).

@johnkerl
Copy link
Member Author

Thanks @eddelbuettel . That's a good idea for another PR. At the moment I want to separate refactors / dependency changes from the careful sequencing of steps necesary for confident rollout of the current-domain-backed new-shape feature. Introducing nanoarrow here would be too many variables: a dependent-package change, along with the current-domain support. As follow-on -- separately -- yes, I am behind this. I'll file an issue to track this great idea.

@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch 12 times, most recently from 655210f to fb3e634 Compare August 17, 2024 22:49
@johnkerl johnkerl changed the title [c++] Arrow utils with current-domain option [WIP] [c++] Arrow utils with current-domain option Aug 17, 2024
@johnkerl johnkerl requested a review from nguyenv August 17, 2024 22:51
@johnkerl johnkerl marked this pull request as ready for review August 17, 2024 22:51
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch 3 times, most recently from def9082 to 5f55777 Compare August 18, 2024 14:12
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch 3 times, most recently from 86d4f5b to db0348b Compare August 18, 2024 14:30
@johnkerl johnkerl changed the base branch from main to kerl/test-common-parameterize August 18, 2024 14:31
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch from db0348b to 3441d9e Compare August 18, 2024 14:41
@johnkerl johnkerl requested a review from eddelbuettel August 18, 2024 17:04
@johnkerl johnkerl changed the title [c++] Arrow utils with current-domain option [c++] Arrow utils with current-domain option [WIP] Aug 19, 2024
@johnkerl johnkerl marked this pull request as draft August 19, 2024 01:57
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch 4 times, most recently from a51f6bd to daecd1b Compare August 19, 2024 02:41
@johnkerl johnkerl changed the title [c++] Arrow utils with current-domain option [WIP] [c++] Arrow utils with current-domain option Aug 19, 2024
@johnkerl johnkerl marked this pull request as ready for review August 19, 2024 02:41
// objects for which soma_joinid is not a dim at all
// * These cases are all actively unit-tested within apis/python/tests
if (dim.type() != TILEDB_INT64) {
throw TileDBSOMAError("Found unexpected non-int64 dimension type.");
Copy link
Contributor

@eddelbuettel eddelbuettel Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know where you are coming from here, and I slept over it one night, but I am not sure I am fully on-board.

We have the factory methods. We have the SOMA definitions. It is clear what a well-formed SOMA object is.

But I still cannot shake the feeling that this seems "wrong" in the sense of working, testing, testable, useful, ... general code being removed.

I'll sleep over it another night. Not objecting to the change, but scratching my chin. 🤔

Copy link
Member Author

@johnkerl johnkerl Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed code is dead ... unit tests would be throwing if it were not. :)

Copy link
Contributor

@eddelbuettel eddelbuettel Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not really my point / I understand that. But such a converter is also useful outside of SOMA, the test is fine, and I have tested and developed too with non-soma dataframes. That can be useful.

Again, not vetoing anything here, just saying that I simultaneously understand why you propose what you and that I likely would not do it. No more, no less, and surely no veto.

(And as an aside, "we can remove because doing so does not break tests" is not really a viable rule because we all know that coverage metrics are never perfect and there may well be some code that potentially calls this (this is open source after all). Anyway all good by me either way ...)

Base automatically changed from kerl/test-common-parameterize to main August 19, 2024 15:12
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch 3 times, most recently from 41c6d96 to 1db05d7 Compare August 20, 2024 14:16
@johnkerl johnkerl requested a review from eddelbuettel August 20, 2024 20:08
libtiledbsoma/src/soma/soma_array.cc Show resolved Hide resolved
libtiledbsoma/src/soma/soma_array.cc Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/arrow_adapter.cc Outdated Show resolved Hide resolved
libtiledbsoma/test/unit_soma_collection.cc Outdated Show resolved Hide resolved
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch from b58dba1 to ee61b8e Compare August 27, 2024 21:55
@johnkerl johnkerl force-pushed the kerl/arrow-util-current-domain-optional branch from c6f7f73 to 3517a1e Compare August 27, 2024 22:41
@johnkerl johnkerl merged commit 9431cf3 into main Aug 27, 2024
15 checks passed
@johnkerl johnkerl deleted the kerl/arrow-util-current-domain-optional branch August 27, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants