-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read/write nested dictionary in ipc stream reader/writer #1566
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1566 +/- ##
==========================================
- Coverage 82.84% 82.83% -0.02%
==========================================
Files 190 190
Lines 54985 55031 +46
==========================================
+ Hits 45552 45584 +32
- Misses 9433 9447 +14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me -- thank you @viirya
DataType::Struct(fields) | DataType::Union(fields, _) => { | ||
collected_fields.extend(fields.iter().flat_map(|f| f.fields())) | ||
} | ||
DataType::List(field) | ||
| DataType::LargeList(field) | ||
| DataType::FixedSizeList(field, _) | ||
| DataType::Map(field, _) => collected_fields.push(field), | ||
DataType::Dictionary(_, value_field) => { | ||
collected_fields.append(&mut self._fields(value_field.as_ref())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth adding an error for types that aren't explicitly supported (I realize this will likely need a bunch of plumbing to return Result
)
like
_ => return Err("Type not supported")
DataType::Struct(fields) | DataType::Union(fields, _) => { | ||
collected_fields.extend(fields.iter().flat_map(|f| f.fields())) | ||
} | ||
DataType::List(field) | ||
| DataType::LargeList(field) | ||
| DataType::FixedSizeList(field, _) | ||
| DataType::Map(field, _) => collected_fields.push(field), | ||
DataType::Dictionary(_, value_field) => { | ||
collected_fields.append(&mut self._fields(value_field.as_ref())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key improvement of the PR is that this now recurses into children, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is key part here to get fields inside dictionary type.
@@ -1439,4 +1440,42 @@ mod tests { | |||
let output_batch = roundtrip_ipc_stream(&input_batch); | |||
assert_eq!(input_batch, output_batch); | |||
} | |||
|
|||
#[test] | |||
fn test_roundtrip_stream_nested_dict_dict() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running this test without the code changes in this PR and it failed thusly. 👍
---- ipc::reader::tests::test_roundtrip_stream_nested_dict_dict stdout ----
thread 'ipc::reader::tests::test_roundtrip_stream_nested_dict_dict' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("dictionary id not found in schema")', arrow/src/ipc/reader.rs:1331:32
when I removed the field code
and like this when I removed the writer support
---- ipc::reader::tests::test_roundtrip_stream_nested_dict_dict stdout ----
thread 'ipc::reader::tests::test_roundtrip_stream_nested_dict_dict' panicked at 'called `Option::unwrap()` on a `None` value', arrow/src/ipc/reader.rs:176:64
@@ -175,13 +186,37 @@ impl IpcDataGenerator { | |||
)?; | |||
} | |||
} | |||
_ => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should return an error for unsupported nested types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess that it doesn't just returning an error, is because if no dictionaries exist inside the fields of these nested types, then the ipc writer still can work. To return an error for dictionaries existing inside unsupported nested types, we need to look into them as supported types, so it's basically almost near to implement them.
I may take some time continuing the work to add more supports (map, etc) here.
Which issue does this PR close?
Closes #1565.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?