-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Fix display of Arrow schema for enum of bytes
datatype
#2305
Conversation
One problem: when I re-run @bkmartinjr 's data-generator script (the one in the description of this PR which creates
|
https://github.com/single-cell-data/TileDB-SOMA/tree/viviannguyen/array-write-path I actually already fixed this in the write path refactor now that enumerations are getting handled in the C++ side instead of in Python. Overall though, I still have 21 failing unit tests for this PR, so I don't think I can get my fix out immediately...
|
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2305 +/- ##
==========================================
- Coverage 78.74% 77.82% -0.92%
==========================================
Files 140 140
Lines 10750 10755 +5
Branches 215 216 +1
==========================================
- Hits 8465 8370 -95
- Misses 2186 2298 +112
+ Partials 99 87 -12
Flags with carried forward coverage won't be shown. Click here to find out more.
|
45cbcb4
to
0b06695
Compare
@@ -377,6 +383,8 @@ std::string_view ArrowAdapter::to_arrow_format( | |||
tiledb_datatype_t datatype, bool use_large) { | |||
switch (datatype) { | |||
case TILEDB_STRING_ASCII: | |||
return use_large ? "Z" : "z"; // large because TileDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @nguyenv noted she has a fix ([c++] Fix display of Arrow schema for enum of
bytes
datatype #2305 (comment)) but that is large and has CI fails - This PR is my attempt to extract the essentials to unblock 1.9 for issue [c++] Fix display of Arrow schema for enum of bytes #2306
- When I have
"Z"
here then the repro script in the description field of this PR passes. But then other existing unit-test cases fail, since what should read back as["a","b"]
reads back as[b"a", b"a"]
- Yet when I have
"U"
here, then the repro script in the description field of this PR fails -- the reported problem is simply not fixed -- but all existing unit-test cases pass.
I seem to recall that STRING_ASCII
along with some other flag, metadata something-something, is used to differentiate between string-as-bytes and string-as-string ... @nguyenv / @ihnorton perhaps you recall how TileDB-Py does this (as it successfully does, in the code snippet included in the description field of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main...viviannguyen/array-write-path#diff-c66edd736f9a9cb059c6f6e0f4cc6d1cdfa0fd91a2b9220dd2e83c12c2546089R451-R457
main...viviannguyen/array-write-path#diff-c66edd736f9a9cb059c6f6e0f4cc6d1cdfa0fd91a2b9220dd2e83c12c2546089R407
I believe these are the snippets that should fix the issue you're seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specially handle the string types for enums like that rather relying on to_tiledb_format
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nguyenv both of the links you sent me open up with "57 files changed" -- non-navigable
Nonetheless I'll take a look at what you wrote in Slack 🤞
bytes
datatype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding the script from #2306 to a python repr
test? Otherwise LGTM.
Thanks @ihnorton I was delaying the merge for a unit-test case It'll be more 'somatic' to test it like this:
with data as ☝️ and I'll get that in before merging |
@nguyenv this is still not right :( When I said this above:
I was misreading my own output :( The original repro script I was using, I had confused myself by putting attrs in the order
A less error-prone ordering is here: https://gist.github.com/johnkerl/6c4a8ee3ceacaaf2e1929873095b2933 And we still have:
outputting
even though TileDB-Py reports correctly, with
outputting
|
OK I will take a look tomorrow. |
At present with R and this PR we have:
so I'll add an R unit-test case here as well |
Re my
we need first to fix R CI failures |
All current CI is green. There are still a few things wrong I'm seeing interactively for which we had inadequate unit-test coverage. More commits coming soon. |
Keep in mind that these arrow pieces will like benefit from / change under / get simpler the nanoarrow refactoring. However I am currently blocked on that one as I see a lot of Python CI shrapnel (though only from IIRC three test files, the majority passes) that I cannot make real progress on ... as I am a complete neophyte when it comes to Python / C(++) interop. |
Good commits on this PR; all green; further work tracked on #2324. |
* [c++] Fix display of Arrow schema for enum of bytes * bugfix per @nguyenv * more of same * try 3 with help of @nguyenv * python unit-test case * extend the unit-test case a bit * implement @nguyenv advice * Partial reversal of schema format assignment * test --------- Co-authored-by: Dirk Eddelbuettel <[email protected]>
#2325) * [c++] Fix display of Arrow schema for enum of bytes * bugfix per @nguyenv * more of same * try 3 with help of @nguyenv * python unit-test case * extend the unit-test case a bit * implement @nguyenv advice * Partial reversal of schema format assignment * test --------- Co-authored-by: John Kerl <[email protected]> Co-authored-by: Dirk Eddelbuettel <[email protected]>
This pull request has been linked to Shortcut Story #43673: tiledbsoma 1.9.0. |
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Issue and/or context: Details on #2306
See also: #2311
[sc-43673]
Checker from #2306:
Output after this PR:
Changes: Report binary appropriately
Notes for Reviewer: