Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-10402: [Rust] Refactor array equality
This is a major refactor of the `equal.rs` module. The rational for this change is many fold: * currently array comparison requires downcasting the array ref to its concrete types. This is painful and not very ergonomics, as the user must "guess" what to downcast for comparison. We can see this in the hacks around `sort`, `take` and `concatenate` kernel's tests, and some of the tests of the builders. * the code in array comparison is difficult to follow given the amount of calls that they perform around offsets. * The implementation currently indirectly uses many of the `unsafe` APIs that we have (via pointer aritmetics), which makes it risky to operate and mutate. * Some code is being repeated. This PR: 1. adds `impl PartialEq for dyn Array`, to allow `Array` comparison based on `Array::data` (main change) 2. Makes array equality to only depend on `ArrayData`, i.e. it no longer depends on concrete array types (such as `PrimitiveArray` and related API) to perform comparisons. 3. Significantly reduces the risk of panics and UB when composite arrays are of different types, by checking the types on `range` comparison 4. Makes array equality be statically dispatched, via `match datatype`. 5. DRY the code around array equality 6. Fixes an error in equality of dictionary with equal values 7. Added tests to equalities that were not tested (fixed binary, some edge cases of dictionaries) 8. splits `equal.rs` in smaller, more manageable files. 9. Removes `ArrayListOps`, since it it no longer needed 10. Moves Json equality to its own module, for clarity. 11. removes the need to have two functions per type to compare arrays. 12. Adds the number of buffers and their respective width to datatypes from the specification. This was backported from apache#8401 13. adds a benchmark for array equality Note that this does not implement `PartialEq` for `ArrayData`, only `dyn Array`, as different data does not imply a different array (due to nullability). That implementation is being worked on apache#8200. IMO this PR significantly simplifies the code around array comparison, to the point where many implementations are 5 lines long. This also improves performance by 10-40%. <details> <summary>Benchmark results</summary> ``` Previous HEAD position was 3dd3c69 Added bench for equality. Switched to branch 'equal' Your branch is up to date with 'origin/equal'. Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 51.28s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/equal-176c3cb11360bd12 Gnuplot not found, using plotters backend equal_512 time: [36.861 ns 36.894 ns 36.934 ns] change: [-43.752% -43.400% -43.005%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe equal_nulls_512 time: [2.3271 us 2.3299 us 2.3331 us] change: [-10.846% -9.0877% -7.7336%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe equal_string_512 time: [49.219 ns 49.347 ns 49.517 ns] change: [-30.789% -30.538% -30.235%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe equal_string_nulls_512 time: [3.7873 us 3.7939 us 3.8013 us] change: [-8.2944% -7.0636% -5.4266%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe ``` </details> All tests are there, plus new tests for some of the edge cases and untested arrays. This change is backward incompatible `array1.equals(&array2)` no longer works: use `array1 == array2` instead, which is the idiomatic way of comparing structs and trait objects in rust. Closes apache#8541 from jorgecarleitao/equal Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Neville Dipale <[email protected]>
- Loading branch information