Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Series.shift Pyarrow Backend Implementation #590

Merged
merged 73 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
bdc6774
Feat: Series.sum test case
mfonekpo Jul 11, 2024
27b74d8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 11, 2024
df7d353
Merge remote-tracking branch 'upstream/main' into factor_out_testing_…
mfonekpo Jul 13, 2024
bb4d3cf
test_series_sum refactoring
mfonekpo Jul 13, 2024
88c1861
resolved merge conflicts
mfonekpo Jul 13, 2024
523906b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
4fa8943
fixing according to standard
mfonekpo Jul 13, 2024
968822d
resolving merge conflict
mfonekpo Jul 13, 2024
d500498
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
219caed
Merge remote-tracking branch 'upstream/main' into factor_out_testing_…
mfonekpo Jul 13, 2024
668bb30
refactoring code to satisfy code criteria
mfonekpo Jul 13, 2024
a9c9a1e
resolved merge conflict
mfonekpo Jul 13, 2024
b64d960
Merge remote-tracking branch 'upstream/main' into factor_out_testing_…
mfonekpo Jul 15, 2024
1d7eabe
Fixing Series.sum functionality for Pyarrow DF
mfonekpo Jul 15, 2024
49f3f57
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 15, 2024
42f2541
Fixing compare mistake on Series.sum for Pyarrow
mfonekpo Jul 15, 2024
3c62c73
Merge remote-tracking branch 'upstream/main' into factor_out_testing_…
mfonekpo Jul 15, 2024
ac7d829
Merge branch 'factor_out_testing_from_test_common' of https://github.…
mfonekpo Jul 15, 2024
3af9833
cleaning up code
mfonekpo Jul 15, 2024
7d80173
removing 'pyarrow_table' from test_sum_all and test_renamed_taxicab
mfonekpo Jul 15, 2024
d99630d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 15, 2024
6eb346b
making sure CI passes all test cases
mfonekpo Jul 15, 2024
7f16748
resolving merge conflict on sum_all_test.py file
mfonekpo Jul 15, 2024
4998647
all test cases passed on CI
mfonekpo Jul 15, 2024
542fe70
Merge remote-tracking branch 'upstream/main' into factor_out_testing_…
mfonekpo Jul 16, 2024
0af3eb0
drop-nulls-test-case first push
mfonekpo Jul 16, 2024
86a9dd2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 16, 2024
27f482b
resolve directory changes conflict
mfonekpo Jul 20, 2024
8abec40
Merge branch 'Feat_Series_drop_nulls' of https://github.com/mfonekpo/…
mfonekpo Jul 20, 2024
ff6a616
refactoring everything to a single line
mfonekpo Jul 20, 2024
e06729d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 20, 2024
ab95998
Making corrections to the files listed
mfonekpo Jul 20, 2024
eebf6b2
Merge branch 'Feat_Series_drop_nulls' of https://github.com/mfonekpo/…
mfonekpo Jul 20, 2024
59a7988
Merge remote-tracking branch 'upstream/main' into Feat_Series_drop_nulls
mfonekpo Jul 21, 2024
02e166f
Merge remote-tracking branch 'upstream/main' into Feat_Series_drop_nulls
mfonekpo Jul 21, 2024
5da2f40
full implementation of drop_dulls for Pyarrow
mfonekpo Jul 22, 2024
859315d
Merge remote-tracking branch 'upstream/main' into Feat_Series_drop_nulls
mfonekpo Jul 22, 2024
bf46f22
removed drop_nulls function on test_common, and moved the test file t…
mfonekpo Jul 22, 2024
6123051
Update tests/expr_and_series/drop_nulls_test.py
MarcoGorelli Jul 22, 2024
1b680ed
Update tests/expr_and_series/drop_nulls_test.py
MarcoGorelli Jul 22, 2024
c4de620
Final fix on drop_nulls
mfonekpo Jul 22, 2024
af635f2
Merge branch 'Feat_Series_drop_nulls' of https://github.com/mfonekpo/…
mfonekpo Jul 22, 2024
e0f4be2
Merge remote-tracking branch 'upstream/main' into Feat_Series_drop_nulls
mfonekpo Jul 22, 2024
2556b8a
my first push for series_shift Pyarrow implementation
mfonekpo Jul 22, 2024
617c991
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 23, 2024
186af5b
Making changes for PR
mfonekpo Jul 23, 2024
8d7439b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 23, 2024
4272980
Almost done; just 1 more push to complete
mfonekpo Jul 24, 2024
59b7bd7
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 24, 2024
c7d9e84
resolving conflict and checking CI passes
mfonekpo Jul 24, 2024
cb73542
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 24, 2024
3c0880e
renamed series_shift to series_shift_test
mfonekpo Jul 24, 2024
65f11e5
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 24, 2024
71863de
editing changes after reseting to my last push
mfonekpo Jul 24, 2024
ef5ea7a
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 24, 2024
4798dfe
Merge branch 'Series_shift' of https://github.com/mfonekpo/narwhals i…
mfonekpo Jul 24, 2024
68b27d8
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 25, 2024
e8089eb
Final implementation of series_shift method for pyarrow backend
mfonekpo Jul 25, 2024
1e28004
conflicts resolved and included fix for multi_chunk
mfonekpo Jul 28, 2024
2bd652c
fixing ruff test fail on CI
mfonekpo Jul 28, 2024
4722a8e
resolving conflicts
mfonekpo Jul 28, 2024
f0af84a
fixing pytest failing with TypeError
mfonekpo Jul 28, 2024
dd7a8e7
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 29, 2024
55bf46b
final fix to series_shift pyarrow test
mfonekpo Jul 29, 2024
19e9eff
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Jul 31, 2024
0e7981c
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Aug 5, 2024
d646b83
I hope this works finally
mfonekpo Aug 5, 2024
84086e9
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Aug 5, 2024
bf0ffe7
changes on shift_test file
mfonekpo Aug 5, 2024
1c326ec
Merge remote-tracking branch 'upstream/main' into Series_shift
mfonekpo Aug 6, 2024
c0ccc1f
series_shit_test complete
mfonekpo Aug 6, 2024
c5ae095
Merge remote-tracking branch 'upstream/main' into Series_shift
MarcoGorelli Aug 7, 2024
5cd0279
fixup multi-chunk case
MarcoGorelli Aug 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions narwhals/_arrow/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,9 @@ def sum(self) -> Self:
def drop_nulls(self) -> Self:
return reuse_series_implementation(self, "drop_nulls")

def shift(self, n: int) -> Self:
return reuse_series_implementation(self, "shift", n)

def alias(self, name: str) -> Self:
# Define this one manually, so that we can
# override `output_names` and not increase depth
Expand Down
24 changes: 24 additions & 0 deletions narwhals/_arrow/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,30 @@ def drop_nulls(self) -> ArrowSeries:
pc = get_pyarrow_compute()
return self._from_native_series(pc.drop_null(self._native_series))

def shift(self, n: int) -> Self:
pa = get_pyarrow()
chunks = self._native_series.chunks
dtype = self._native_series.type

# This condition shifts the series by n positions.
if n > 0:
# If n is positive, prepend nulls and truncate the end
shifted_chunks = [pa.array([None] * n, type=dtype)] + [
chunk.slice(0, len(chunk) - n) for chunk in chunks
]
elif n < 0:
# If n is negative, append nulls and truncate the start
shifted_chunks = [chunk.slice(-n, len(chunk) + n) for chunk in chunks] + [
pa.array([None] * -n, type=dtype)
]
else:
# If n is 0, return the original series
shifted_chunks = chunks
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved

# Flatten the list of chunks to create a single array
shifted_array = pa.concat_arrays(shifted_chunks)
return self._from_native_series(shifted_array)

def std(self, ddof: int = 1) -> int:
pc = get_pyarrow_compute()
return pc.stddev(self._native_series, ddof=ddof) # type: ignore[no-any-return]
Expand Down
48 changes: 48 additions & 0 deletions tests/expr_and_series/series_shift_test.py
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from __future__ import annotations

from typing import Any

import pyarrow as pa

import narwhals as nw
from tests.utils import compare_dicts


def test_shift(constructor_eager: Any) -> None:
data = {
"A": [1, 2, None, 4],
"B": [5, 6, 7, 8],
"C": [None, None, None, None],
"D": [9, 10, 11, 12],
}

df = nw.from_native(constructor_eager(data), eager_only=True)

result_a = df.select(nw.col("A").shift(1))
result_b = df.select(nw.col("B").shift(-1))
result_c = df.select(nw.col("C").shift(1))
result_d = df.select(nw.col("D").shift(2))

expected_a = {"A": [float("nan"), 1.0, 2.0, float("nan")]}
expected_b = {"B": [6.0, 7.0, 8.0, float("nan")]}
expected_c = {"C": [float("nan"), float("nan"), float("nan"), float("nan")]}
expected_d = {"D": [float("nan"), float("nan"), 9, 10]}

compare_dicts(result_a, expected_a)
compare_dicts(result_b, expected_b)
compare_dicts(result_c, expected_c)
compare_dicts(result_d, expected_d)


def test_shift_multi_chunk_pyarrow() -> None:
tbl = pa.table({"a": [1, 2, 3]})
tbl = pa.concat_tables([tbl, tbl, tbl])
tbl = tbl.combine_chunks()
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
df = nw.from_native(tbl, eager_only=True)
result_a = df.select(nw.col("a").shift(1))
result_b = df.select(nw.col("a").shift(-1))
expected_a = {"a": [None, 1, 2, 3, 1, 2, 3, 1, 2]}
expected_b = {"a": [2, 3, 1, 2, 3, 1, 2, 3, None]}

compare_dicts(result_a, expected_a)
compare_dicts(result_b, expected_b)
14 changes: 3 additions & 11 deletions tests/expr_and_series/shift_test.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
from typing import Any

import pytest

import narwhals.stable.v1 as nw
from tests.utils import compare_dicts

Expand All @@ -13,11 +11,8 @@
}


def test_shift(request: Any, constructor: Any) -> None:
if "pyarrow_table" in str(constructor):
request.applymarker(pytest.mark.xfail)

df = nw.from_native(constructor(data))
def test_shift(constructor_eager: Any) -> None:
df = nw.from_native(constructor_eager(data), eager_only=True)
Copy link
Member

@MarcoGorelli MarcoGorelli Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't need to change in this test - only in test_shift_series did you need to do that, constructor is fine here, because we're testing Expr.shift

Suggested change
def test_shift(constructor_eager: Any) -> None:
df = nw.from_native(constructor_eager(data), eager_only=True)
def test_shift(constructor: Any) -> None:
df = nw.from_native(constructor(data))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you address this comment please?

Copy link
Member

@MarcoGorelli MarcoGorelli Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs addressing, could you read over the suggested change more carefully and address it please?

You've removed the eager_only part, but you still need to rename constructor_eager to constructor (just for this test!)

result = df.with_columns(nw.col("a", "b", "c").shift(2)).filter(nw.col("i") > 1)
expected = {
"i": [2, 3, 4],
Expand All @@ -28,10 +23,7 @@ def test_shift(request: Any, constructor: Any) -> None:
compare_dicts(result, expected)


def test_shift_series(request: Any, constructor_eager: Any) -> None:
if "pyarrow_table" in str(constructor_eager):
request.applymarker(pytest.mark.xfail)

def test_shift_series(constructor_eager: Any) -> None:
df = nw.from_native(constructor_eager(data), eager_only=True)
expected = {
"i": [2, 3, 4],
Expand Down
95 changes: 95 additions & 0 deletions utils/check_backend_completeness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
"""
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
Hopefully temporary script which tracks which methods we're missing
for the PyArrow table backend.

If you implement a method, please remove it from the `MISSING` list.
"""

# ruff: noqa
import sys

import narwhals as nw
from narwhals._arrow.dataframe import ArrowDataFrame

MISSING = [
"DataFrame.pipe",
"Series.round",
]


class MockDataFrame:
# Make a little mock object so we can instantiate
# PandasLikeDataFrame without having pandas installed
def __init__(self, dataframe): ...

def __narwhals_dataframe__(self):
return self

@property
def columns(self):
return []

@property
def loc(self):
return self

def __getitem__(self, *args):
return MockSeries(self)


class MockSeries:
# Make a little mock object so we can instantiate
# nw.DataFrame without having dataframe libraries
# installed
def __init__(self, series): ...

def __narwhals_series__(self):
return self

@property
def name(self):
return "a"


if __name__ == "__main__":
missing = []
no_longer_missing = []

df_pa = ArrowDataFrame(MockDataFrame({"a": [1, 2, 3]}), backend_version=(13, 0))
df_nw = nw.DataFrame(
MockDataFrame({"a": [1, 2, 3]}),
level="full",
)
pa_methods = [f"DataFrame.{x}" for x in df_pa.__dir__() if not x.startswith("_")]
nw_methods = [f"DataFrame.{x}" for x in df_nw.__dir__() if not x.startswith("_")]
missing.extend(
[
x
for x in nw_methods
if x not in pa_methods and x not in MISSING and x not in {"level"}
]
)
no_longer_missing.extend([x for x in MISSING if x in pa_methods and x in nw_methods])

ser_pa = df_pa["a"]
ser_pd = df_nw["a"]
pa_methods = [f"Series.{x}" for x in ser_pa.__dir__() if not x.startswith("_")]
nw_methods = [f"Series.{x}" for x in ser_pd.__dir__() if not x.startswith("_")]
missing.extend([x for x in nw_methods if x not in pa_methods and x not in MISSING])
no_longer_missing.extend([x for x in MISSING if x in pa_methods and x in nw_methods])

if missing:
print(
"The following have not been implemented for the Arrow backend: ",
sorted(missing),
)
sys.exit(1)

if no_longer_missing:
print(
"Please remove the following from MISSING in utils/check_backend_completeness.py: ",
sorted(no_longer_missing),
)
sys.exit(1)

sys.exit(0)
Loading