-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] [CSV Reader] Bulk CSV reader + general CSV reader refactor #1614
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1614 +/- ##
==========================================
+ Coverage 85.00% 85.01% +0.01%
==========================================
Files 55 55
Lines 5287 5291 +4
==========================================
+ Hits 4494 4498 +4
Misses 793 793
|
) | ||
} | ||
|
||
#[getter] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the pyo3 get
attribute instead i believe
https://pyo3.rs/v0.20.0/class?search=get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You unfortunately cannot use the field-wise get
attribute within the context of a cfg_attr
on the struct, and we can't use the get_all
workaround since some of the getters aren't trivial, so we need to explicitly define getters for each field.
where | ||
R: AsyncRead + Unpin + Send, | ||
R: AsyncRead + Unpin + Send + 'static, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try R: AsyncReader
instead to drop the 'static
requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncReader
is a struct, not a trait, and we're bounding the async-readable type that the AsyncReader
is instantiated with. Although I might be misunderstanding your suggestion here. https://docs.rs/arrow2/latest/arrow2/io/csv/read_async/struct.AsyncReader.html
We need the R: AsyncRead
bound on the inner async-readable for constructing the AsyncReader
(link) and for reading from the reader with read_rows
(link).
4838c9c
to
89765af
Compare
This PR adds support for bulk CSV reading to the native CSV reader, and integrates bulk CSV reading with
MicroPartition
as the default reading path.Driveby Refactors
CsvConvertOptions
,CsvParseOptions
, andCsvReadOptions
. This reduces the bloat of our execution-side code (and tests) by a good bit, and providing config objects that are transparently passed through the execution layer should make it easier to add more CSV configuration options in the future (less code to change). Note that these are currently not exposed to the query plan (logical or physical), although these might be moved into theFileFormatConfig
enum once the oldTable
-based I/O path is removed.CsvReadStats
struct.