-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Dataframe read path #1793
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1793 +/- ##
==========================================
- Coverage 77.59% 76.05% -1.55%
==========================================
Files 87 136 +49
Lines 7784 10552 +2768
Branches 0 206 +206
==========================================
+ Hits 6040 8025 +1985
- Misses 1744 2433 +689
- Partials 0 94 +94
Flags with carried forward coverage won't be shown. Click here to find out more.
|
7ee0b85
to
8992661
Compare
c0d02b0
to
4e63bc4
Compare
7db68e1
to
2c8fa29
Compare
4477714
to
4306121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the usual disclaimer that I’m mostly looking at the Python side for style etc. and didn’t look at the C++ code in detail)
I should add that I like the way the open-and-select-the-right-wrapper thing works now. It’s very well structured and understandable. |
d5f4dc5
to
8f7951e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing huge to sugggest -- some spelling changes etc
* Move `PyQueryCondition` Into `common.h` * Use Pyarrow Schema instead of TileDB ArraySchema * Remove TileDB-Py dependency * No longer requires attr-to-enum mapping passed for dictionaries as this can be checked in Pyarrow Schema now
* Eventually the `arrow_schema` calls should replace `schema` but quite a few things still depend on the TileDB ArraySchema so this is going to be temporarily punted for now
* The R API should be refactored soon to not rely on tiledb::ArraySchema
7c4b511
to
03e7a61
Compare
03e7a61
to
53cca7e
Compare
@beroy I have now merged your changes into my PR. The only change I did was move your reindexer unit tests to reside within the |
Issue and/or context:
Running Changes:
DataFrame
in read-mode, useDataFrameWrapper
which wraps aroundclib.SOMADataFrame
. Otherwise,DataFrame
should use the already existing write-path withArrayWrapper
which wraps around a TileDB-Py ArraySOMADataFrame
needs to use sparse arraySOMADataFrame::count
needs to usennz
create
needs to setsoma_object_type
andexists
needs to checksoma_object_type
SOMADataFrame
DataFrameWrapper
andArrayWrapper
(i.e nonempty domain, attr names, dim names, etc)SOMADataFrame
_query_condition.py
no longer has dependency on TileDB-Py and uses Pyarrow schema instead of TileDB ArraySchema andSOMAError
instead ofTileDBError
PyQueryCondition
(note: should refactorQueryCondition
to completely removePyQueryCondition
usage) intocommon.h
SOMAArray
now resides in its ownsoma_array.cc
filepytiledbsoma.cc
is now the top-level file that contains stats bindings and loads modules fromquery_condition.cc
,soma_array.cc
,soma_dataframe.cc
SOMADataFrame
ArrowAdapter::to_arrow
read
Pybind11 function now uses correct Arrow schema to convert dictionary arrays when importing from C to PythonSOMAObject::schema
returnsArrowSchema
which calls fromSOMAArray::arrow_schema
unique_ptr<SOMAObject> SOMAObject::open
and bind in Pybind11 asclib.SOMAObject.open
which returns the correct Python SOMA classRuntimeError
should raiseSOMAError
#783