Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pandas, CSV, and Parquet backends do not have inclusive preceding window bound #2000

Closed
hjoo opened this issue Oct 18, 2019 · 8 comments
Labels
bug Incorrect behavior inside of ibis pandas The pandas backend window functions Issues or PRs related to window functions

Comments

@hjoo
Copy link
Contributor

hjoo commented Oct 18, 2019

According to documentation, ibis window preceding and following bounds are inclusive.

Pandas, CSV, and Parquet backends have exclusive preceding window boundaries.

Example (taken from ibis/all/test_window.py):

window = ibis.window(
        preceding=2,
        following=0,
        group_by=[alltypes.string_col],
        order_by=[alltypes.id],
    )

expr = alltypes.mutate(val=alltypes.double_col.sum().over(window))
result = expr.execute().set_index('id').sort_index()

result[['string_col', 'double_col', 'val']].groupby('string_col').get_group('1').head()
   string_col  double_col   val
id                             
1           1        10.1  10.1
11          1        10.1  20.2
21          1        10.1  20.2
31          1        10.1  20.2
41          1        10.1  20.2

expected[['string_col', 'double_col', 'val']].groupby('string_col').get_group('1').head()
   string_col  double_col   val
id                             
1           1        10.1  10.1
11          1        10.1  20.2
21          1        10.1  30.3
31          1        10.1  30.3
41          1        10.1  30.3
toryhaavik pushed a commit that referenced this issue Oct 23, 2019
Implemented `preceding` and `following` window boundaries for PySpark
backend.    Also, added tests in `ibis/tests/all/test_window.py` that
tests bounded windows. (note: filed issue #2000 for incorrect behavior
in Pandas, Csv, and Parquet backends)
Author: Hyonjee <[email protected]>

Closes #2001 from hjoo/window-bounds and squashes the following commits:

e4ad7f0 [Hyonjee] add comment explaining why we don't set window bounds for shift operations in pyspark backend
e94945c [Hyonjee] reformatted with black
c4893f5 [Hyonjee] implement window boundaries for pyspark backend and add tests for bounded windows
@hjoo
Copy link
Contributor Author

hjoo commented Oct 24, 2019

Pandas backend behavior is also incorrect for ibis.trailing_window. The argument to ibis.trailing_window() is a preceding inclusive window bound according to the ibis docs. This is also how other sql backends behave. Currently the tests in pandas backend treat the argument as a window size.

Rather, this is the expected behavior:

ibis.trailing_window(preceding=2) is equivalent to pandas.DataFrame.rolling(3) where 3 is the window size.

ibis.trailing_window(preceding=0) is equivalent to pandas.DataFrame.rolling(1) where 1 is the window size.

So on and so forth.

@hjoo
Copy link
Contributor Author

hjoo commented Oct 24, 2019

@jreback @toryhaavik: let me know if you disagree. I'll be working on a PR with updated tests and the behavior described above.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2019

Rather, this is the expected behavior:

ibis.trailing_window(preceding=2) is equivalent to pandas.DataFrame.rolling(3) where 3 is the window size.

ibis.trailing_window(preceding=0) is equivalent to pandas.DataFrame.rolling(1) where 1 is the window size.

This just seems wrong. Why would you expect this? or maybe I am misunderstanding the meaning of preceding. the argument to rolling is the window size.

@hjoo
Copy link
Contributor Author

hjoo commented Oct 30, 2019

That's what's expected in other backends. The argument to pandas rolling is window size. The argument to ibis trailing_window is not size but an inclusive preceding row bound. 0 indicates current row, 1 indicates row before, etc.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2019

ok can u make that clear in comments /
doc strings

also i think a release note should indicate this is an api breaking change

@hjoo
Copy link
Contributor Author

hjoo commented Oct 30, 2019

Updated docs and release notes in PR #2009. Let me know if I used the right convention.

Re: window size vs window boundary - I agree it's a bit confusing, but I'd prefer to separate the issue from this one because I think that will need a longer API discussion. If we want to change the Ibis trailing_window API altogether to take a window size (like pandas) rather than preceding window bound, window(preceding=1, following=0) would no longer be equivalent to trailing_window(1).

@jreback jreback added this to the Next Bugfix Release milestone Nov 1, 2019
@jreback jreback added bug Incorrect behavior inside of ibis pandas The pandas backend window functions Issues or PRs related to window functions labels Nov 1, 2019
@icexelloss
Copy link
Contributor

@hjoo @jreback Can this be closed?

@hjoo
Copy link
Contributor Author

hjoo commented Dec 4, 2019

Yep, will close.

@hjoo hjoo closed this as completed Dec 4, 2019
lostmygithubaccount pushed a commit that referenced this issue Feb 12, 2024
<!--
Thanks for taking the time to contribute to Ibis!

Please ensure that your pull request title matches the conventional
commits
specification: https://www.conventionalcommits.org/en/v1.0.0/
-->

## Description of changes

<!--
Write a description of the changes commensurate with the pull request's
scope.

Extremely small changes such as fixing typos do not need a description.
-->

Port https://github.com/claypotai/ibis-flink-example to a
quickstart/blog post.

## Issues closed

<!--
Please add Resolves #<issue number> (no angle brackets) if this pull
request
resolves any outstanding issues.

For example, if your pull requests resolves issues 1000, 2000 and 3000
write:

* Resolves #1000
* Resolves #2000
* Resolves #3000

If your pull request doesn't resolve any issues, you can delete this
section
entirely, including the `## Issues closed` section header.
-->
Resolves #7739
cpcloud pushed a commit to cpcloud/ibis that referenced this issue Apr 24, 2024
…n feedback (#26)

<!--
Thanks for taking the time to contribute to Ibis!

Please ensure that your pull request title matches the conventional
commits
specification: https://www.conventionalcommits.org/en/v1.0.0/
-->

## Description of changes

<!--
Write a description of the changes commensurate with the pull request's
scope.

Extremely small changes such as fixing typos do not need a description.
-->

## Issues closed

<!--
Please add Resolves #<issue number> (no angle brackets) if this pull
request
resolves any outstanding issues.

For example, if your pull requests resolves issues 1000, 2000 and 3000
write:

* Resolves ibis-project#1000
* Resolves ibis-project#2000
* Resolves ibis-project#3000

If your pull request doesn't resolve any issues, you can delete this
section
entirely, including the `## Issues closed` section header.
-->

---------

Co-authored-by: Chloe He <[email protected]>
lostmygithubaccount pushed a commit that referenced this issue Jul 12, 2024
<!--
Thanks for taking the time to contribute to Ibis!

Please ensure that your pull request title matches the conventional
commits
specification: https://www.conventionalcommits.org/en/v1.0.0/
-->

## Description of changes
Ibis works great within Shiny for Python as well, not just Quarto. Add
link to Shiny to the docs.
<!--
Write a description of the changes commensurate with the pull request's
scope.

Extremely small changes such as fixing typos do not need a description.
-->

## Issues closed

<!--
Please add Resolves #<issue number> (no angle brackets) if this pull
request
resolves any outstanding issues.

For example, if your pull requests resolves issues 1000, 2000 and 3000
write:

* Resolves #1000
* Resolves #2000
* Resolves #3000

If your pull request doesn't resolve any issues, you can delete this
section
entirely, including the `## Issues closed` section header.
-->
cpcloud pushed a commit that referenced this issue Sep 2, 2024
<!--
Thanks for taking the time to contribute to Ibis!

Please ensure that your pull request title matches the conventional
commits
specification: https://www.conventionalcommits.org/en/v1.0.0/
-->

## Description of changes

<!--
Write a description of the changes commensurate with the pull request's
scope.

Extremely small changes such as fixing typos do not need a description.
-->

## Issues closed

<!--
Please add Resolves #<issue number> (no angle brackets) if this pull
request
resolves any outstanding issues.

For example, if your pull requests resolves issues 1000, 2000 and 3000
write:

* Resolves #1000
* Resolves #2000
* Resolves #3000

If your pull request doesn't resolve any issues, you can delete this
section
entirely, including the `## Issues closed` section header.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pandas The pandas backend window functions Issues or PRs related to window functions
Projects
None yet
Development

No branches or pull requests

3 participants