You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Teach PyArray__getitem__ which just delegates to scalar_at.
Teach PyVortex to use parallelism during decompression.
Make reading a Vortex file into an Arrow Array as fast as Parquet (at least partly needs to address read/write disagreement on chunk size, see below we side stepped this by removing buffer sizes when implementing filter pushdown).
read should have high throughput on files written by write (currently, write does not enforce chunking whereas read does which can degrade throughput on arrays for which slice is not free again, sidestepped by removing buffer sizing from filter pushdown).
(docs) Write a comparison section which describes similarities and differences from other file formats.
Expose Vortex compute methods in Python API by way of new classes (e.g. Array.as_struct() and StructArray which permits column selection).
Teach Vortex IsNull & IsNotNull and plumb substrait into them.
Do not expose modules with confusing names such as vortex.encoding.
Expose more functions on scalar values such as __eq__, array indexing, or getting a memoryview.
Teach PyVortex (really: Layout readers and writers) to read/write non-struct arrays.
Teach RecordBatchReader to read from multiple files.
Teach Polars to write Vortex files.
Teach DuckDB to write Vortex files.
Support multiple files and/or directories in the Vortex Dataset API
Consider using the Rufo theme for the docs
Long term:
For Torch, expose a method to read from a Vortex file directly into a mutable NumPy array. Torch does not support immutable NumPy arrays.
Reduce Vortex array metadata size. This primarily benefits very small datasets (e.g. PBI AirlineSentiment).
Implement a RecordBatchReader for Vortex arrays and Vortex files.
Implement a Pandas ExtensionArray which permits compute on the compressed array representations (thus avoiding the cost of decompression).
In Progress:
Short term:
PyArray
__getitem__
which just delegates toscalar_at
.at least partly needs to address read/write disagreement on chunk size, see belowwe side stepped this by removing buffer sizes when implementing filter pushdown).read
should have high throughput on files written bywrite
(currently, write does not enforce chunking whereas read does which can degrade throughput on arrays for which slice is not freeagain, sidestepped by removing buffer sizing from filter pushdown).Array.as_struct()
andStructArray
which permits column selection).vortex.encoding
.__eq__
, array indexing, or getting amemoryview
.Long term:
Complete:
write
should either warn or compress by default so that naive users understand whywrite(vortex.array(...), ...)
gives no benefit. feat: expanded Python docs + Rust docs #1137PyArray
scalar_at
(requested by Rob) feat: teach PyArray scalar_at #1095.or fixRoaringIntArray and RoaringBoolArray to be no-copy. Vortex should support zero-copy roaring int and roaring bool array construction #1075, fix: disable roaring compressors #1076compress
. feat: vortex.dataset.Dataset: deep integration with Polars & DuckDB #1089The text was updated successfully, but these errors were encountered: