perf(rust, python): Faster is_sorted when no flag set #9777

magarick · 2023-07-08T00:17:16Z

This will check sortedness without sorting the entire series even when no flag is set. Seems faster.

ritchie46

Nice TODO hunt. :)

s-banach · 2023-07-25T17:58:03Z

Does this break is_sorted on struct columns?

magarick · 2023-07-25T18:09:04Z

Does this break is_sorted on struct columns?

Not that I know of. It seems fine. Did you see something?

s-banach · 2023-07-25T18:14:34Z

In polars = 0.18.8, I get the following, maybe or maybe not related to this PR:

import polars as pl
df = pl.DataFrame({"x": [1, 2], "y": [3, 4]})
df.select(pl.struct("x", "y")).to_series().is_sorted()

PanicException: not implemented

magarick · 2023-07-25T18:26:46Z

You're right. I tried with 0.18.7
I think I know what the issue is.

magarick · 2023-07-25T19:00:03Z

Inequality operations aren't implemented for structs. It makes sense to have them. After all, if you can sort structs, you can compare them. But some care is required.
It seems like I could either

Implement it at the ChunkedArray level, but this is a bad idea because it requires checking the fields and possibly returning an error if the structs aren't the same type internally. That would require changing return types and introduces higher level behavior than feels appropriate there.
Implement it at the Series level. Those functions already return a PolarsResult and a little extra checking for structs, then building up an output seems fine.
Special case it for is_sorted and check each field individually if the Series is a Struct.

I'm inclined toward option 2 but I'll defer to @ritchie46 since he knows the internals much better.

ritchie46 · 2023-07-26T09:36:57Z

We should do this in two steps. First fix the regression by pattern matching on structs (and maybe lists) as well. For those we can restore old behavior.

Next we can see if we can implement comparison on structs. This needs to be implemented on the StructChunked struct. A quick way to implement this is doing field wise comparison and then combine the boolean outputs with an AND operation. Later we can check if we need to optimize it further.

s-banach · 2023-07-26T12:29:37Z

I thought structs were sorted in dictionary order,
eg (0, 1) < (1, 0),
so I don’t understand what you mean by combining the fieldwise comparisons with AND.

ritchie46 · 2023-07-26T12:35:28Z

I thought structs were sorted in dictionary order, eg (0, 1) < (1, 0), so I don’t understand what you mean by combining the fieldwise comparisons with AND.

I am talking about equality inly in this case. The other one needs row encoding. Probably that's the proper wat for all struct comparisons.

magarick · 2023-07-26T17:22:19Z

Equality is already implemented for Structs, the issue is inequality checks, which aren't implemented in ChunkCompare. The only issue is defining the output when schemata don't match. You might have to change the associated type from BooleanChunked to PolarsResult<BooleanChunked> because non-comparable structs could either give
a. All false (probably not what you want)
b. All null (maybe what you want, but then you're conflating this case with comparing against a null value)
c. Raise an error

Cases:

Names and types are equal, but ordering is different. Probably don't want to mess with reordering.
Names are different but field types are all the same and in the same order. Maybe this is ok to compare, maybe not.
Names are the same, types are the same. Do the comparison!
Anything else: return all nulls or fail

Honestly, I think I could do the proper change in not much longer than special-casing a fix, but if you'd like time to think about and discuss how to handle the operation properly we can do it in two phases.

Faster is_sorted

f40d277

magarick requested review from ritchie46, stinodego and alexander-beedie as code owners July 8, 2023 00:17

github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jul 8, 2023

fmt

75817ff

ritchie46 approved these changes Jul 10, 2023

View reviewed changes

ritchie46 merged commit 9962605 into pola-rs:main Jul 10, 2023

magarick deleted the is-sorted branch July 12, 2023 21:12

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023

perf(rust, python): Faster is_sorted when no flag set (pola-rs#9777)

c69cf4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(rust, python): Faster is_sorted when no flag set #9777

perf(rust, python): Faster is_sorted when no flag set #9777

magarick commented Jul 8, 2023

ritchie46 left a comment

s-banach commented Jul 25, 2023

magarick commented Jul 25, 2023

s-banach commented Jul 25, 2023

magarick commented Jul 25, 2023

magarick commented Jul 25, 2023

ritchie46 commented Jul 26, 2023

s-banach commented Jul 26, 2023

ritchie46 commented Jul 26, 2023

magarick commented Jul 26, 2023

perf(rust, python): Faster is_sorted when no flag set #9777

perf(rust, python): Faster is_sorted when no flag set #9777

Conversation

magarick commented Jul 8, 2023

ritchie46 left a comment

Choose a reason for hiding this comment

s-banach commented Jul 25, 2023

magarick commented Jul 25, 2023

s-banach commented Jul 25, 2023

magarick commented Jul 25, 2023

magarick commented Jul 25, 2023

ritchie46 commented Jul 26, 2023

s-banach commented Jul 26, 2023

ritchie46 commented Jul 26, 2023

magarick commented Jul 26, 2023