Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does pandas Round method explodes my data frame? #21810

Closed
blucap opened this issue Jul 8, 2018 · 3 comments
Closed

Why does pandas Round method explodes my data frame? #21810

blucap opened this issue Jul 8, 2018 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@blucap
Copy link

blucap commented Jul 8, 2018

I try to round all values in this dataframe. However, the pandas round() method explodes my dataframe from 150 rows to 7518 rows.

Perhaps there is something odd with the data in the dataframe, but then again, one would not expect a simple rounding function to do this.

Below, I replicate the error using 1) simulated data and 2) the data that leads to the said error.

This results in 150 rows, which is the correct number:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random([150, 4]), columns=['A', 'B', 'C', 'D'])
df["cat"] = "MID"
df.loc[:399,["cat"]] = "LOW"
df.iloc[-400:,-1] = "HI"
df.cat.value_counts()
df.set_index("cat", inplace=True)
df.round(3) 

Using the data from my dropbox folder, the round-function produces a whopping 7518 rows:

dfb = pd.read_pickle('dfna.pkl')
dfb.round(3)

This is strange. I solved it for now using this rather ugly line:

dfb = dfb.reset_index().round({'A': 1, 'B': 2, 'C': 3, 'D': 4}).set_index('tricile')

However, this is not ideal, given that pandas' round method acts in mysterious ways and may affect future programs.

@TomAugspurger
Copy link
Contributor

Can you try making a simple, reproducible example that demonstrates the issue? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

We can't use pickle files for a unit test.

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Jul 8, 2018
@TomAugspurger
Copy link
Contributor

Actually, this looks like #21809

@TomAugspurger TomAugspurger added Duplicate Report Duplicate issue or pull request and removed Needs Info Clarification about behavior needed to assess issue labels Jul 8, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Jul 8, 2018
@blucap
Copy link
Author

blucap commented Jul 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants