-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement function slice for RecordBatch #490
Conversation
arrow/src/record_batch.rs
Outdated
pub fn slice(&self, offset: usize, length: usize) -> Result<RecordBatch> { | ||
let schema = self.schema(); | ||
let num_columns = self.num_columns(); | ||
let new_columns = (0..num_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.columns().iter()
^_^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My local version is
/// Slice all the arrays in the record batch.
pub fn slice(&self, offset: usize, len: usize) -> Self {
let columns = self
.columns()
.iter()
.map(|array| array.slice(offset, len))
.collect();
Self {
schema: self.schema.clone(),
columns,
}
}
Does it make sense to go through RecordBatch::try_new()
? It incurs some overhead checking that the schema and arrays match, when they already should match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao thanks, I have modified, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nevi-me Thanks for your suggestion, I made a modification. PTAL
arrow/src/record_batch.rs
Outdated
@@ -244,6 +244,21 @@ impl RecordBatch { | |||
&self.columns[..] | |||
} | |||
|
|||
/// Return a new RecordBatch where each column is sliced | |||
/// according to `offset` and `length` | |||
pub fn slice(&self, offset: usize, length: usize) -> Result<RecordBatch> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove the Result
wrapping if nothing can fail.
@@ -426,6 +441,29 @@ mod tests { | |||
assert_eq!(5, record_batch.column(1).data().len()); | |||
} | |||
|
|||
#[test] | |||
fn create_record_batch_slice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test for edge cases, e.g. empty length, over boundary offset, empty columns, empty batch, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimexist I have a question, if an empty RecordBatch calls slice(), the offset and length are zero, should we panic or return an empty RecordBatch slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return an empty slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return an empty slice.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified, PTAL
@Dandandan @jimexist
Thank you @b41sh ! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #490 +/- ##
==========================================
- Coverage 82.65% 82.64% -0.02%
==========================================
Files 165 165
Lines 45524 45703 +179
==========================================
+ Hits 37628 37769 +141
- Misses 7896 7934 +38 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @b41sh
Hi @b41sh. Thank you for the contribution. It appears there are some issues that the CI tests found (clippy): https://github.com/apache/arrow-rs/pull/490/checks?check_run_id=2905850637 Can you please resolve them? |
@alamb thanks for your review, I have fixed the Clippy, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks great. Thank you so much @b41sh !
Which issue does this PR close?
Closes #460
Rationale for this change
slice
can be used to handle part ofRecordBatch
What changes are included in this PR?
Implement function
slice
forRecordBatch
Are there any user-facing changes?