Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Attach arrow::ArrayStatistics to arrow::ArrayData #43797

Closed
kou opened this issue Aug 23, 2024 · 1 comment
Closed

[C++] Attach arrow::ArrayStatistics to arrow::ArrayData #43797

kou opened this issue Aug 23, 2024 · 1 comment

Comments

@kou
Copy link
Member

kou commented Aug 23, 2024

Describe the enhancement requested

This is another approach of GH-43666.

GH-41909 introduced arrow::ArrayStatitstics but it's not associated with arrow::Array.

This issue attaches arrow::ArrayStatistics to arrow::ArrayData.

There was a concern that associated arrow::ArrayStatistics may be outdated if arrow::ArrayData is mutated after attaching arrow::ArrayStatistics. But arrow::ArrayData isn't mutable after the first population. So arrow::ArrayStatistics will not be outdated. We can require mutators to take responsibility for statistics.

See also: Discussion at #43705 (comment)

Component(s)

C++

kou added a commit to kou/arrow that referenced this issue Aug 23, 2024
…yData`

If we can attach associated statistics to an array via `ArrayData`, we
can use it in later processes such as query planning.
kou added a commit to kou/arrow that referenced this issue Aug 23, 2024
…yData`

If we can attach associated statistics to an array via `ArrayData`, we
can use it in later processes such as query planning.
kou added a commit to kou/arrow that referenced this issue Aug 28, 2024
…yData`

If we can attach associated statistics to an array via `ArrayData`, we
can use it in later processes such as query planning.
kou added a commit to kou/arrow that referenced this issue Aug 30, 2024
…yData`

If we can attach associated statistics to an array via `ArrayData`, we
can use it in later processes such as query planning.
kou added a commit that referenced this issue Sep 3, 2024
…#43801)

### Rationale for this change

If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.

If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels.

There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics.

### What changes are included in this PR?

* Add `arrow::ArrayData::statistics`
* Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData`

This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

`arrow::Array::statistics()` is a new public API.
* GitHub Issue: #43797

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou kou added this to the 18.0.0 milestone Sep 3, 2024
@kou
Copy link
Member Author

kou commented Sep 3, 2024

Issue resolved by pull request 43801
#43801

@kou kou closed this as completed Sep 3, 2024
mapleFU pushed a commit to mapleFU/arrow that referenced this issue Sep 3, 2024
…yData` (apache#43801)

### Rationale for this change

If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.

If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels.

There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics.

### What changes are included in this PR?

* Add `arrow::ArrayData::statistics`
* Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData`

This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

`arrow::Array::statistics()` is a new public API.
* GitHub Issue: apache#43797

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Sep 6, 2024
…yData` (apache#43801)

### Rationale for this change

If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.

If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels.

There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics.

### What changes are included in this PR?

* Add `arrow::ArrayData::statistics`
* Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData`

This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

`arrow::Array::statistics()` is a new public API.
* GitHub Issue: apache#43797

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
khwilson pushed a commit to khwilson/arrow that referenced this issue Sep 14, 2024
…yData` (apache#43801)

### Rationale for this change

If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.

If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels.

There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics.

### What changes are included in this PR?

* Add `arrow::ArrayData::statistics`
* Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData`

This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

`arrow::Array::statistics()` is a new public API.
* GitHub Issue: apache#43797

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant