-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Attach arrow::ArrayStatistics
to arrow::ArrayData
#43797
Comments
kou
added a commit
to kou/arrow
that referenced
this issue
Aug 23, 2024
…yData` If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.
kou
added a commit
to kou/arrow
that referenced
this issue
Aug 23, 2024
…yData` If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.
kou
added a commit
to kou/arrow
that referenced
this issue
Aug 28, 2024
…yData` If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.
kou
added a commit
to kou/arrow
that referenced
this issue
Aug 30, 2024
…yData` If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning.
kou
added a commit
that referenced
this issue
Sep 3, 2024
…#43801) ### Rationale for this change If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning. If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels. There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics. ### What changes are included in this PR? * Add `arrow::ArrayData::statistics` * Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData` This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. `arrow::Array::statistics()` is a new public API. * GitHub Issue: #43797 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Issue resolved by pull request 43801 |
mapleFU
pushed a commit
to mapleFU/arrow
that referenced
this issue
Sep 3, 2024
…yData` (apache#43801) ### Rationale for this change If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning. If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels. There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics. ### What changes are included in this PR? * Add `arrow::ArrayData::statistics` * Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData` This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. `arrow::Array::statistics()` is a new public API. * GitHub Issue: apache#43797 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Sep 6, 2024
…yData` (apache#43801) ### Rationale for this change If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning. If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels. There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics. ### What changes are included in this PR? * Add `arrow::ArrayData::statistics` * Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData` This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. `arrow::Array::statistics()` is a new public API. * GitHub Issue: apache#43797 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
khwilson
pushed a commit
to khwilson/arrow
that referenced
this issue
Sep 14, 2024
…yData` (apache#43801) ### Rationale for this change If we can attach associated statistics to an array via `ArrayData`, we can use it in later processes such as query planning. If `ArrayData` not `Array` has statistics, we can use statistics in computing kernels. There was a concern that associated `arrow::ArrayStatistics` may be outdated if `arrow::ArrayData` is mutated after attaching `arrow::ArrayStatistics`. But `arrow::ArrayData` isn't mutable after the first population. So `arrow::ArrayStatistics` will not be outdated. We can require mutators to take responsibility for statistics. ### What changes are included in this PR? * Add `arrow::ArrayData::statistics` * Add `arrow::Array::statistics()` to get statistics attached in `arrow::ArrayData` This doesn't provide a new `arrow::ArrayData` constructor (`arrow::ArrayData::Make()`) that accepts `arrow::ArrayStatistics`. We can change `arrow::ArrayData::statistics` after we create `arrow::ArrayData`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. `arrow::Array::statistics()` is a new public API. * GitHub Issue: apache#43797 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
This is another approach of GH-43666.
GH-41909 introduced
arrow::ArrayStatitstics
but it's not associated witharrow::Array
.This issue attaches
arrow::ArrayStatistics
toarrow::ArrayData
.There was a concern that associated
arrow::ArrayStatistics
may be outdated ifarrow::ArrayData
is mutated after attachingarrow::ArrayStatistics
. Butarrow::ArrayData
isn't mutable after the first population. Soarrow::ArrayStatistics
will not be outdated. We can require mutators to take responsibility for statistics.See also: Discussion at #43705 (comment)
Component(s)
C++
The text was updated successfully, but these errors were encountered: