Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): building blocks for expression expansion sets #9231

Merged
merged 4 commits into from
Jun 5, 2023

Conversation

ritchie46
Copy link
Member

This is not intended for external usage, but this provides the building blocks that allow us to implement selection unions.

With this @alexander-beedie's new selector API can implement:

# SET A + SET B
# select all "bar" named column AND all list columns
(s.matches("^bar.*$") + s.by_dtype(pl.Utf8)).other_expression(..)

# SET A - SET B
# select all datetime columns except the ones with "foo" in the name
(s.datetime(..) - s.contains("foo")).other_expression(..)

This exposes a meta._as_selector() expression, which creates a state:

Expr::Selector{
   add: [Expr],
   subtract: [Expr]
}

where the add and subtract can hold expanding expressions.

The Expr::Selector can do UNION ADD with _selector_add and UNION SUBTRACT with _selector_sub.

example:

    df = pl.DataFrame({name: [] for name in "abcde"})

    s1 = pl.all().meta._as_selector()
    s2 = pl.col(["a", "b"])
    s = s1.meta._selector_sub(s2)
    assert df.select(s).columns == ["c", "d", "e"]

    s1 = pl.col("^a|b$").meta._as_selector()
    s = s1.meta._selector_add(pl.col(["d", "e"]))
    assert df.select(s).columns == ["a", "b", "d", "e"]

    s = s.meta._selector_sub(pl.col("d"))
    assert df.select(s).columns == ["a", "b", "e"]

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 5, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 5, 2023

Awesome; will integrate into the the python-side selector proxy. (Should also allow for first/last inversion... :)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 5, 2023

This covers and and not; can we get or next? :)

df.select( (s.ends_with("_usd") | s.string()) )

Also... maybe rename as _selector_union and _selector_exclude? (as "add" and "sub" ops can still be broadcast to the selected fields, eg: s.numeric() - 100).

Correction:

Oops, no, or IS _selector_and ... In which case, I mean, can we get and (and the rename, haha... think that confused me for a second) :)

df.select( (s.ends_with("_usd") & s.string()) )

@ritchie46
Copy link
Member Author

Will take a look. That's a bit more complicated as we don't have the schema yet.

@ritchie46
Copy link
Member Author

ritchie46 commented Jun 5, 2023

(as "add" and "sub" ops can still be broadcast to the selected fields, eg: s.numeric() - 100).

What do you mean?

EDIT: The proxy should dispatch the proper python overload to this one.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 5, 2023

EDIT: The proxy should dispatch the proper python overload to this one.

Yup, and it will - I just mean that purely from a naming perspective add/sub refer to the operators "+" and "-" on the python side (dunder methods), whereas the appropriate operators for logical unions of the selectors would be "&" (when/if available), "|", and "~". So not a functional issue, just naming-alignment; if it makes more sense as add/sub on the Rust side then it's not a big deal ;)

@ritchie46
Copy link
Member Author

ritchie46 commented Jun 5, 2023

I see that | == +, but how is ~ equal -? It misses an operand.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 5, 2023

I see that | == +, but how is ~ equal -? It misses an operand.

By itself it isn't, but by construction ~something should be equal to <everything> - <something>, and I could set that up inside __invert__ where necessary (eg: inverting first/last selectors).

@ritchie46
Copy link
Member Author

Right, but that assumes that <everything> holds "everything". It can also hold as subset e.g. "something".

It gets philosophical ^^

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 5, 2023

Well, I can experiment accordingly ;)
But "|" (or) definitely looks good to go, heh...

@ritchie46
Copy link
Member Author

Yes, the and is more complicated. Could do it a single time. But it is not easy to do A & B | C & D. We should nest the selectors in a tree, but not really happy with that complexity.

@ritchie46
Copy link
Member Author

Will try to follow up with an & operator.

@ritchie46 ritchie46 merged commit df53d8a into main Jun 5, 2023
@ritchie46 ritchie46 deleted the selector branch June 5, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants