Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Passing dataframe to a function creates an inplace update to the original dataframe #35859

Closed
2 of 3 tasks
trenton3983 opened this issue Aug 22, 2020 · 2 comments
Closed
2 of 3 tasks

Comments

@trenton3983
Copy link

trenton3983 commented Aug 22, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
from typing import Tuple

# sample data
data = {'Date': [pd.Timestamp('2019-01-31 00:00:00'), pd.Timestamp('2019-02-01 00:00:00'), pd.Timestamp('2019-02-04 00:00:00'), pd.Timestamp('2019-02-05 00:00:00'), pd.Timestamp('2019-02-06 00:00:00'), pd.Timestamp('2019-02-07 00:00:00'), pd.Timestamp('2019-02-08 00:00:00'), pd.Timestamp('2019-02-11 00:00:00'), pd.Timestamp('2019-02-12 00:00:00'), pd.Timestamp('2019-02-13 00:00:00'), pd.Timestamp('2019-02-14 00:00:00')],
        'Close': [166.44000244140625,  166.52000427246094,  171.25,  174.17999267578125,  174.24000549316406,  170.94000244140625,  170.41000366210938,  169.42999267578125,  170.88999938964844,  170.17999267578125,  170.8000030517578]}

# create dataframe
aapl = pd.DataFrame(data)

def find_trend(data: pd.DataFrame, period: int) -> Tuple[pd.Series, pd.Series]:

    data['sma'] = data['Close'].rolling(period).mean()  # this creates an inplace update to aapl
    diff = data['sma'] - data['sma'].shift(1)  # calculates a series of values
    greater_than_0 = diff > 0  # creates a series of bools
    return diff, greater_than_0


aapl['value'], aapl['trend'] = find_trend(aapl, 4)

Current Output

  • Note the creation of the sma column
  • Is this the expected behavior?
|    | Date                |   Close |     sma |       value | trend   |
|---:|:--------------------|--------:|--------:|------------:|:--------|
|  0 | 2019-01-31 00:00:00 |  166.44 | nan     | nan         | False   |
|  1 | 2019-02-01 00:00:00 |  166.52 | nan     | nan         | False   |
|  2 | 2019-02-04 00:00:00 |  171.25 | nan     | nan         | False   |
|  3 | 2019-02-05 00:00:00 |  174.18 | 169.597 | nan         | False   |
|  4 | 2019-02-06 00:00:00 |  174.24 | 171.548 |   1.95      | True    |
|  5 | 2019-02-07 00:00:00 |  170.94 | 172.653 |   1.105     | True    |
|  6 | 2019-02-08 00:00:00 |  170.41 | 172.443 |  -0.209999  | False   |
|  7 | 2019-02-11 00:00:00 |  169.43 | 171.255 |  -1.1875    | False   |
|  8 | 2019-02-12 00:00:00 |  170.89 | 170.417 |  -0.837502  | False   |
|  9 | 2019-02-13 00:00:00 |  170.18 | 170.227 |  -0.190002  | False   |
| 10 | 2019-02-14 00:00:00 |  170.8  | 170.325 |   0.0974998 | True    |

Problem description

Expected Output

|    | Date                |   Close |       value | trend   |
|---:|:--------------------|--------:|------------:|:--------|
|  0 | 2019-01-31 00:00:00 |  166.44 | nan         | False   |
|  1 | 2019-02-01 00:00:00 |  166.52 | nan         | False   |
|  2 | 2019-02-04 00:00:00 |  171.25 | nan         | False   |
|  3 | 2019-02-05 00:00:00 |  174.18 | nan         | False   |
|  4 | 2019-02-06 00:00:00 |  174.24 |   1.95      | True    |
|  5 | 2019-02-07 00:00:00 |  170.94 |   1.105     | True    |
|  6 | 2019-02-08 00:00:00 |  170.41 |  -0.209999  | False   |
|  7 | 2019-02-11 00:00:00 |  169.43 |  -1.1875    | False   |
|  8 | 2019-02-12 00:00:00 |  170.89 |  -0.837502  | False   |
|  9 | 2019-02-13 00:00:00 |  170.18 |  -0.190002  | False   |
| 10 | 2019-02-14 00:00:00 |  170.8  |   0.0974998 | True    |

Resolves Issue

  • I know changing the function, as follows, will result in not creating an inplace update to aapl
def find_trend(data: pd.DataFrame, period: int) -> Tuple[pd.Series, pd.Series]:

    sma = data['Close'].rolling(period).mean()  # does not create an inplace update to aapl
    diff = sma - sma.shift(1)  # calculates a series of values
    greater_than_0 = diff > 0  # creates a series of bools
    return diff, greater_than_0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: 0.9.0
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@trenton3983 trenton3983 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020
@asishm
Copy link
Contributor

asishm commented Aug 22, 2020

this is unrelated and has been the case since at least v0.22.0 (which is when I started using it)

you are mutating the dataframe in your function (since python is pass by reference and dataframes are mutable). Either pass in a copy of the dataframe or don't mutate the dataframe (as you suggested)

@dsaxton dsaxton added Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020
@dsaxton dsaxton added this to the No action milestone Aug 22, 2020
@trenton3983
Copy link
Author

this is unrelated and has been the case since at least v0.22.0 (which is when I started using it)

you are mutating the dataframe in your function (since python is pass by reference and dataframes are mutable). Either pass in a copy of the dataframe or don't mutate the dataframe (as you suggested)

  • Thank you. I haven't passed dataframes to a function in this way, so I wasn't certain if this was the accepted behavior or not, and I couldn't find anything addressing the issue.
  • It came up with this question on Stack Overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants