Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44555: [C++][Compute] Allow casting struct to bigger nullable struct #44587

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Oct 30, 2024

Rationale for this change

Sometimes its useful to add a column full of nulls in a cast. Currently this is supported in top level columns but not inside structs. Example where this is important: delta-io/delta-rs#1610

What changes are included in this PR?

Add support for filling in columns full of null for nullable struct fields.
I've gone for a fairly minimal change that achieves what I needed but I wonder if there should be a more significant change so that this casting is done by field name and ignore the field order.

Are these changes tested?

Yes. The expected behaviour in several existing tests has been altered and a couple of new tests have been added.

I also rolled out a custom build with this change internally because it suddenly became a critical problem.

Are there any user-facing changes?

Yes. There are scenarios that previously failed with struct fields don't match or are in the wrong order but now succeed after filling in nulls.

Copy link

⚠️ GitHub issue #44555 has been automatically assigned in GitHub to PR creator.

@Tom-Newton
Copy link
Contributor Author

I'm not confident that I've taken the right approach here but I think its ready for review.

@Tom-Newton Tom-Newton marked this pull request as ready for review October 30, 2024 18:50
@Tom-Newton
Copy link
Contributor Author

Ok, it looks like there is at least one more tests I need to update. I can probably sort that tomorrow.

@@ -2328,7 +2328,7 @@ DatasetAndBatches MakeNestedDataset() {
field("b", boolean()),
field("c", struct_({
field("d", int64()),
field("e", float64()),
field("e", int64()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this was not supposed to be different from in the physical schema. This data is used to test if a virtual column will be materialised so I guess it shouldn't be testing int64 -> float64 casting at the same time.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 1, 2024
@Tom-Newton
Copy link
Contributor Author

Sorry for the direct ping but @lidavidm please could you review when you get a chance. I assume you are the right person to review this?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, thanks. CC @bkietz for some more eyes?

cpp/src/arrow/compute/kernels/scalar_cast_nested.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_cast_nested.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 14, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 14, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 14, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 14, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Nov 21, 2024
@lidavidm
Copy link
Member

Ok, I don't think that one CI failure is related here. Merging now. Thanks @Tom-Newton, sorry for the delays!

@lidavidm lidavidm merged commit c14d55d into apache:main Nov 25, 2024
39 of 40 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Nov 25, 2024
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit c14d55d.

There were 132 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Nov 25, 2024

Ok, I don't think that one CI failure is related here. Merging now. Thanks @Tom-Newton, sorry for the delays!

Thanks for reviewing 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants