Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Compute] Support casting a smaller struct to a bigger one if extra fields are nullable #44555

Closed
Tom-Newton opened this issue Oct 28, 2024 · 4 comments

Comments

@Tom-Newton
Copy link
Contributor

Tom-Newton commented Oct 28, 2024

Describe the enhancement requested

Arrow can already cast to a schema with an extra column if that column is nullable but it can't do the same on structs.

More precisely I want the follow unittests to pass

TEST(Cast, StructToBiggerStruct) {
  std::vector<std::string> field_names = {"a", "b"};
  std::shared_ptr<Array> a, b;
  a = ArrayFromJSON(int8(), "[1, 2]");
  b = ArrayFromJSON(int8(), "[3, 4]");
  ASSERT_OK_AND_ASSIGN(auto src, StructArray::Make({a, b}, field_names));

  const auto dest =
      arrow::struct_({std::make_shared<Field>("a", int8()),
                      std::make_shared<Field>("b", int8()),
                      std::make_shared<Field>("c", int8(), /*nullable=*/false)});
  const auto options = CastOptions::Safe(dest);

  EXPECT_RAISES_WITH_MESSAGE_THAT(
      TypeError,
      ::testing::HasSubstr("struct fields don't match or are in the wrong order"),
      Cast(src, options));
}

TEST(Cast, StructToBiggerNullableStruct) {
  std::vector<std::string> field_names = {"a", "b"};
  std::shared_ptr<Array> a, b, c;
  a = ArrayFromJSON(int8(), "[1, 2]");
  b = ArrayFromJSON(int8(), "[3, 4]");
  ASSERT_OK_AND_ASSIGN(auto src, StructArray::Make({a, b}, field_names));

  const auto type_dest = arrow::struct_(
      {std::make_shared<Field>("a", int8()), std::make_shared<Field>("b", int8()),
       std::make_shared<Field>("c", int8(), /*nullable=*/true)});
  const auto options = CastOptions::Safe(type_dest);

  c = ArrayFromJSON(int8(), "[null, null]");
  ASSERT_OK_AND_ASSIGN(auto dest, StructArray::Make({a, b, c}, {"a", "b", "c"}));
  CheckCast(src, dest);
}

Does this sound like a reasonable thing to support? The motivation comes from delta-io/delta-rs#1610 but I think the fix would require a change to arrow.

Component(s)

C++

@Tom-Newton
Copy link
Contributor Author

This suddenly became a critical problem for me so I ended up implementing something that works, but I'm not really sure if its a good implementation. If the idea seems reasonable I can open a PR.

@kou
Copy link
Member

kou commented Oct 30, 2024

@lidavidm What do you think about this? It seems that this is related to #12724 / #31101.

@lidavidm
Copy link
Member

Yeah, I think in general if it works on tables but not structs (or vice versa) we should make them both work.

lidavidm added a commit that referenced this issue Nov 25, 2024
…ct (#44587)

### Rationale for this change
Sometimes its useful to add a column full of nulls in a cast. Currently this is supported in top level columns but not inside structs. Example where this is important: delta-io/delta-rs#1610

### What changes are included in this PR?
Add support for filling in columns full of null for nullable struct fields. 
I've gone for a fairly minimal change that achieves what I needed but I wonder if there should be a more significant change so that this casting is done by field name and ignore the field order. 

### Are these changes tested?
Yes. The expected behaviour in several existing tests has been altered and a couple of new tests have been added.

I also rolled out a custom build with this change internally because it suddenly became a critical problem. 

### Are there any user-facing changes?
Yes. There are scenarios that previously failed with `struct fields don't match or are in the wrong order` but now succeed after filling in `null`s.

* GitHub Issue: #44555

Lead-authored-by: Thomas Newton <[email protected]>
Co-authored-by: David Li <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm lidavidm added this to the 19.0.0 milestone Nov 25, 2024
@lidavidm
Copy link
Member

Issue resolved by pull request 44587
#44587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants