-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Implement cumulative product, max, and min compute functions #32190
Comments
This comment was marked as outdated.
This comment was marked as outdated.
Hey, is this coming? |
@frosk1 As far as I know there is no work underway currently on cumulative vector functions (beyond the cumulative sum vector function which was implemented in #12460). If you're interested to take a stab at implementing one of these in a PR, following the example of the cumulative sum implementation, we would be happy to review. |
Hi @frosk1 are you working on this? I plan to implement cumprod, cummax and cummin if no one else is working on it. |
Hey @js8544 no unfortunately I do not have time atm. Would be great if you can work on that one. |
take |
I noticed that the functions cumprod, cummin and cummax would require the same option as I want to avoid having four different option types with identical structure, so I intend to refactor The current definition is /// \brief Options for cumulative sum function
class ARROW_EXPORT CumulativeSumOptions : public FunctionOptions {
public:
explicit CumulativeSumOptions(double start = 0, bool skip_nulls = false,
bool check_overflow = false);
explicit CumulativeSumOptions(std::shared_ptr<Scalar> start, bool skip_nulls = false,
bool check_overflow = false);
static constexpr char const kTypeName[] = "CumulativeSumOptions";
static CumulativeSumOptions Defaults() { return CumulativeSumOptions(); }
/// Optional starting value for cumulative operation computation
std::shared_ptr<Scalar> start;
/// If true, nulls in the input are ignored and produce a corresponding null output.
/// When false, the first null encountered is propagated through the remaining output.
bool skip_nulls = false;
/// When true, returns an Invalid Status when overflow is detected
bool check_overflow = false;
}; I plan to do this: /// \brief Options for cumulative functions
class ARROW_EXPORT CumulativeOptions : public FunctionOptions {
// ... (same as above)
};
using CumulativeSumOptions = CumulativeOptions; // for backward compatibility I checked that with this change all compute tests still passed, both in cpp and python. But I want to make sure there are no better ways to solve this. Do you consider this as an acceptable approach? cc @pitrou @westonpace |
@js8544 This seems like a good approach to me. |
I agree with this approach. We have done something very similar with aggregate functions and arrow/cpp/src/arrow/compute/api_aggregate.h Lines 45 to 59 in 8b2ab4d
|
CC @icexelloss / @rtpsw / @ildipo may be interested in these functions when available. They are classic window functions and so could be useful in the work that is being done to add window function support to Acero. |
Regarding window functions. I also implemented a I also plan to implement a series of |
…ions (#36020) ### Rationale for this change Implement cumulative prod, max and min compute functions ### What changes are included in this PR? 1. Add implementations, docs and tests for the three functions. 2. Refactor `CumulativeSumOptions` to `CumulativeOptions` for reusability. 3. Fix a bug where `GenericFromScalar(GenericToScalar(std::nullopt)) != std::nullopt`. 4. Remove an unnecessary Cast with the default start value. 5. Add tests to check behavior with `NaN`. I'll explain some of the changes in comments. ### Are these changes tested? Yes, in vector_accumulative_ops_test.cc and test_compute.py ### Are there any user-facing changes? No. The data members of `CumulativeSumOptions` are changed, but the member functions behave as before. And std::optional<T> also can be constructed directly from T. So users should not feel any difference. * Closes: #32190 Lead-authored-by: Jin Shang <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…for independent deprecation (#36977) **Rationale for this change** As #36240 says, we refactor CumulativeSumOptions to a separate class. **What changes are included in this PR?** - independent CumulativeSumOptions - the original simple test before #32190 - fix a typo in CumulativeOptions **Are these changes tested?** No. Actually, the PR can't pass the `test_option_class_equality` in test_compute.py ([Error example](https://github.com/apache/arrow/actions/runs/5728571658/job/15523443371?pr=36977)). Cause CumulativeSumOptions's C++ part is also CumulativeOptions. ![image](https://github.com/apache/arrow/assets/18380073/0a173684-47f8-4eb9-b8f4-ba72aa5aab97) **Are there any user-facing changes?** No. Closes: #36240 * Closes: #36240 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Dane Pitkin <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: AlenkaF <[email protected]>
…class for independent deprecation (apache#36977) **Rationale for this change** As apache#36240 says, we refactor CumulativeSumOptions to a separate class. **What changes are included in this PR?** - independent CumulativeSumOptions - the original simple test before apache#32190 - fix a typo in CumulativeOptions **Are these changes tested?** No. Actually, the PR can't pass the `test_option_class_equality` in test_compute.py ([Error example](https://github.com/apache/arrow/actions/runs/5728571658/job/15523443371?pr=36977)). Cause CumulativeSumOptions's C++ part is also CumulativeOptions. ![image](https://github.com/apache/arrow/assets/18380073/0a173684-47f8-4eb9-b8f4-ba72aa5aab97) **Are there any user-facing changes?** No. Closes: apache#36240 * Closes: apache#36240 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Dane Pitkin <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: AlenkaF <[email protected]>
Other libraries/languages (pandas, R, numpy) have cumprod, cummax and cummin functions that useful to add to Arrow, similar to the now existing cumulative sum function.
Reporter: Jabari Booker / @JabariBooker
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-16865. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: