Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] RecordBatchBuilder doesn't work with list of structs #44918

Closed
fpacanowski opened this issue Dec 3, 2024 · 2 comments
Closed

[Ruby] RecordBatchBuilder doesn't work with list of structs #44918

fpacanowski opened this issue Dec 3, 2024 · 2 comments

Comments

@fpacanowski
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

RecordBatchBuilder doesn't seem to correctly handle a schema that contains a list of structs. Here's a minimal test case:

schema = Arrow::Schema.new(
  [
   Arrow::Field.new("structs", Arrow::ListDataType.new(
     Arrow::StructDataType.new([
       Arrow::Field.new("foo", :int64),
       Arrow::Field.new("bar", :int64)
     ])
   ))
 ]
)

table = Arrow::RecordBatchBuilder.build(schema, [
  { structs: [] },
  { structs: [] },
]).to_table

assert_equal(2, table.n_rows)

Table should have 2 rows, but it's empty (tested on HEAD).

I've also checked that equivalent code in PyArrow works correctly (the table has two rows):

import pyarrow as pa
import pyarrow.parquet as pq

schema = pa.schema(
    [
        pa.field(
            "structs",
            pa.list_(
                pa.struct([
                    pa.field("foo", pa.int64()),
                    pa.field("bar", pa.int64())
                ])
            )
        )
    ]
)

data = [
    {"structs": []},
    {"structs": []}
]

table = pa.Table.from_pylist(data, schema=schema)
print(table.shape)

pq.write_table(table, "file.parquet")

Related bug report: #44742.

Component(s)

Ruby

@fpacanowski fpacanowski changed the title RecordBatchBuilder doesn't work with list of structs. [Ruby] RecordBatchBuilder doesn't work with list of structs. Dec 3, 2024
@kou kou changed the title [Ruby] RecordBatchBuilder doesn't work with list of structs. [Ruby] RecordBatchBuilder doesn't work with list of structs Dec 4, 2024
@kou
Copy link
Member

kou commented Dec 4, 2024

Oh, sorry.

kou added a commit that referenced this issue Dec 9, 2024
)

### Rationale for this change

`Arrow::ListArrayBuilder#append_value` with `[]` must append an empty list as an element. In general, it works but it doesn't work when list item is struct or list. Because `Arrow::{List,Struct}ArrayBuilder#append` without arguments appends an element. It's for a backward compatibility. But it has a problem for this case.

For example, the following case has this problem:

```ruby
item_type = Arrow::StructDataType.new([{name: "visible", type: :boolean}])
data_type = Arrow::ListDataType.new(name: "struct", data_type: item_type)
builder = Arrow::ListArrayBuilder.new(data_type)
builder.append_value([])
array = builder.finish
array.to_a # => must be [[]] but [] without this change
```

### What changes are included in this PR?

This should be fixed by GH-44763 but the fix was wrong. This change fixes wrong `return if` location.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #44918

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou kou added this to the 19.0.0 milestone Dec 9, 2024
@kou
Copy link
Member

kou commented Dec 9, 2024

Issue resolved by pull request 44933
#44933

@kou kou closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants