-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Potential regression on FieldRef/FieldPath non-flattening Get methods #36892
Comments
@benibus Would you have time to take a look at this? |
@benibus @westonpace I was planning on waiting a couple days to create the next RC (Tuesday?) to see if this was able to make it to 13.0.0. If this sounds totally unreasonable let me know and I can create the new RC sooner as discussed on Zulip we might want to ship 13.0.0 with this known issue and apply the fix targeting 14.0.0. |
@benibus any news here? I plan to create the new RC tomorrow unless there's possibility for this to be solved soon. |
Should have a fix for this by the end of the week, provided my diagnosis is actually complete. At the very least, the PR introduced substantial overhead for calling That being said, the two aren't exactly interchangeable in the implementation so there's some restructuring in order. |
### Rationale for this change #35197 appears to have introduced significant performance regressions in `FieldPath::Get` - indicated [here](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/), in a benchmark that uses a wide (10K column) dataframe. ### What changes are included in this PR? - Adds basic benchmarks for `FieldPath::Get` across various input types, as they didn't previously exist - Addresses several performance issues. These came in the form of extremely high upfront costs for the `RecordBatch` and `ArrayData` overloads specifically - Some minor refactoring of `NestedSelector` ### Are these changes tested? Yes (covered by existing tests) ### Are there any user-facing changes? No * Closes: #36892 Lead-authored-by: benibus <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
### Rationale for this change #35197 appears to have introduced significant performance regressions in `FieldPath::Get` - indicated [here](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/), in a benchmark that uses a wide (10K column) dataframe. ### What changes are included in this PR? - Adds basic benchmarks for `FieldPath::Get` across various input types, as they didn't previously exist - Addresses several performance issues. These came in the form of extremely high upfront costs for the `RecordBatch` and `ArrayData` overloads specifically - Some minor refactoring of `NestedSelector` ### Are these changes tested? Yes (covered by existing tests) ### Are there any user-facing changes? No * Closes: #36892 Lead-authored-by: benibus <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…apache#37032) ### Rationale for this change apache#35197 appears to have introduced significant performance regressions in `FieldPath::Get` - indicated [here](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/), in a benchmark that uses a wide (10K column) dataframe. ### What changes are included in this PR? - Adds basic benchmarks for `FieldPath::Get` across various input types, as they didn't previously exist - Addresses several performance issues. These came in the form of extremely high upfront costs for the `RecordBatch` and `ArrayData` overloads specifically - Some minor refactoring of `NestedSelector` ### Are these changes tested? Yes (covered by existing tests) ### Are there any user-facing changes? No * Closes: apache#36892 Lead-authored-by: benibus <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
I did not pay attention initially, but it seems #35197 introduced a large regression on the wide-dataframe benchmark.
See benchmark results here:
https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/
Note the benchmark is creating a dataframe with 10000 columns.
Component(s)
C++
The text was updated successfully, but these errors were encountered: