Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Enable Unicode non-indexed columns #777

Merged
merged 10 commits into from
Jan 22, 2023

Conversation

bkmartinjr
Copy link
Member

Issue and/or context:

Fixes #415

With the release of TileDB 2.14, UTF-8 is fully supported in Array attributes. Array dimensions continue to be ASCII only at this time.

Changes:

  • Added support to provide differentiated Arrow/TileDB conversion for Array attributes and Array dimensions
  • Allow UTF-8/Unicode non-indexed SOMA DataFrame columns
  • Enhance unit tests to verify that value_filter functions correctly on unicode columns

Notes for Reviewer:

None

@bkmartinjr bkmartinjr marked this pull request as ready for review January 20, 2023 18:58
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

apis/python/src/tiledbsoma/util_arrow.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/util_arrow.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Base: 70.07% // Head: 78.56% // Increases project coverage by +8.48% 🎉

Coverage data is based on head (793d97d) compared to base (98b588e).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #777      +/-   ##
==========================================
+ Coverage   70.07%   78.56%   +8.48%     
==========================================
  Files          70       30      -40     
  Lines        3586     2011    -1575     
==========================================
- Hits         2513     1580     -933     
+ Misses       1073      431     -642     
Flag Coverage Δ
python 78.56% <100.00%> (-0.19%) ⬇️
r ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
apis/python/src/tiledbsoma/dataframe.py 97.29% <100.00%> (ø)
apis/python/src/tiledbsoma/util_arrow.py 92.98% <100.00%> (+0.12%) ⬆️
apis/python/src/tiledbsoma/collection.py 84.32% <0.00%> (-1.63%) ⬇️
apis/python/src/tiledbsoma/sparse_nd_array.py 94.06% <0.00%> (-0.85%) ⬇️
R/SOMADataFrame.R
inst/tiledb/include/tiledb/vfs.h
inst/tiledb/include/tiledb/core_interface.h
inst/tiledb/include/tiledb/dimension.h
R/SOMASparseNdArray.R
... and 35 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@johnkerl johnkerl changed the title Enable unicode non-indexed columns [python] Enable Unicode non-indexed columns Jan 20, 2023
@bkmartinjr bkmartinjr merged commit 096f716 into main Jan 22, 2023
@bkmartinjr bkmartinjr deleted the bkmartinjr/415-unicode-attrs branch January 22, 2023 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python] Unicode data generates error upon write to dataframe
3 participants