-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Support return_arrow for various queries #256
Conversation
4597a0d
to
35936f8
Compare
else: | ||
return (array_number, False, None) | ||
|
||
def _ascii_to_unicode_arrow_readback(self, df: pa.Table) -> pa.Table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this function? This makes a full copy of fields and then creates a new pyarrow table from python arrays. Why do we need to perform a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shelnutt2 it's a separate function so as to be called from the ThreadPoolExecutor
-- I'm open to feedback if you have a better way.
Or if you're asking why copy at all -- please notice that it does a copy+cast only when the need to do ASCII-to-Unicode. (It's careful to return None
if nothing needs conversion.) And this conversion is crucial -- not optional -- since without this user-level queries like 'cell_type == "myeloid cell"'
would return nothing since columns would have values like b"myeloid cell"
which is !=
to "myeloid cell"
.
Please see
- Queryability of dataframe attribute columns #99
- Store to-be-queried obs/var columns as ASCII (workaround) #101
- Extend ASCII-queryability workaround #138
- Correct ASCII-to-Unicode readback for attribute_filter #141
We had extensive discussions on this a couple months ago. We must (can't not) do Unicode-to-ASCII converstions on writes and ASCII-to-Unicode conversions on reads until such time as we have core mods in place which:
- Permit string attrs to be Unicode and be queryable using query conditions
- Permit string dims to be Unicode at all
cc @ihnorton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be followed up separately.
35936f8
to
a2b8cc8
Compare
Note that ingest-from-arrow is a separate issue and is not covered on this PR.