Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add ibis.dense_rank() top-level function #5408

Closed
1 task done
NickCrews opened this issue Feb 1, 2023 · 2 comments · Fixed by #7327
Closed
1 task done

feat(api): add ibis.dense_rank() top-level function #5408

NickCrews opened this issue Feb 1, 2023 · 2 comments · Fixed by #7327
Labels
feature Features or general enhancements window functions Issues or PRs related to window functions
Milestone

Comments

@NickCrews
Copy link
Contributor

Is your feature request related to a problem?

This isn't a huge problem, I have something working, but more of a ux question. I have the following util function:

from typing import Iterable

import ibis
from ibis.expr.types import (
    Column,
    IntegerColumn,
    Table,
)
import pandas as pd


def group_id(t: Table, keys: str | Column | Iterable[str | Column]) -> IntegerColumn:
    """Number each group from 0 to the "number of groups - 1" 
    
    This is equivalent to pandas.DataFrame.groupby(keys).ngroup().
    """
    # We need an arbitrary column to use for dense_rank
    col = t[t.columns[0]]
    return col.dense_rank().over(ibis.window(order_by=keys))

Example usage:

df = pd.DataFrame(
    {
        "a": [5, 6, 6, 5],
        "b": [1, 2, 3, 1],
    }
)
t = ibis.memtable(df)

t.mutate(group_id=group_id(t, ["a", "b"]))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ a     ┃ b     ┃ group_id ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ int64 │ int64 │ int64    │
├───────┼───────┼──────────┤
│     5 │     1 │        0 │
│     5 │     1 │        0 │
│     6 │     2 │        1 │
│     6 │     3 │        2 │
└───────┴───────┴──────────┘

I find it strange that I need to call dense_rank() on an arbitrary column of table. It feels dirty to just choose an arbitrary column. In raw SQL, DENSE_RANK() can stand all by itself, it doesn't have to associated with another expression. Similar to how ibis.row_number(). I would think that ibis.dense_rank() should work, and that the expression could bind to the table later, when I do the mutate(). Then I could remove the Table argument from my group_id() function.

Or perhaps I am missing something here. Or if you have a suggestion on how to do this task cleaner, I'd appreciate that too.

Describe the solution you'd like

ibis.dense_rank() ? and similar for the other rank functions?

What version of ibis are you running?

4.1.0

What backend(s) are you using, if any?

all

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Feb 1, 2023
@cpcloud
Copy link
Member

cpcloud commented Feb 2, 2023

Thanks for the issue.

Your request makes sense. I believe the original intent was that the method would be a convenience function for the following and the top level one just never got implemented for whatever reason (likely that no one ever asked about it!):

t.mutate(x=t.col.dense_rank())
# desugars to
t.mutate(x=ibis.dense_rank().over(ibis.window(order_by=[t.col])))

Of course, this breaks down when you have a compound sort key and we definitely need to support that use case!

+1 for adding (and keeping the existing for convenience if that makes sense) top-level functions for rank/dense_rank and any others whose value are only determined by ordering keys.

@cpcloud cpcloud added this to the 5.0 milestone Feb 2, 2023
@cpcloud cpcloud added the window functions Issues or PRs related to window functions label Feb 2, 2023
@cpcloud cpcloud changed the title feat: add ibis.dense_rank() feat(api): add ibis.dense_rank() Feb 2, 2023
@cpcloud cpcloud changed the title feat(api): add ibis.dense_rank() feat(api): add ibis.dense_rank() top-level function Feb 2, 2023
@NickCrews
Copy link
Contributor Author

Thanks for that breakdown. I'm fairly new to SQL so I wasn't quite sure I was grokking that correctly, but that explanation of the syntactic sugar really helps. Thanks for adding this!

@cpcloud cpcloud modified the milestones: 5.0, 5.1 Mar 10, 2023
@cpcloud cpcloud modified the milestones: 5.1, 6.0 Mar 22, 2023
@cpcloud cpcloud modified the milestones: 6.0, 6.1 Jul 3, 2023
@cpcloud cpcloud added this to the 7.1 milestone Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements window functions Issues or PRs related to window functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants