Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.16节 kaggle_house的一个小错误 #46

Closed
CaiyuZhang opened this issue Oct 23, 2019 · 2 comments
Closed

3.16节 kaggle_house的一个小错误 #46

CaiyuZhang opened this issue Oct 23, 2019 · 2 comments

Comments

@CaiyuZhang
Copy link

CaiyuZhang commented Oct 23, 2019

bug描述
在预处理数据代码中,原文为
all_features[numeric_features] = all_features[numeric_features].fillna(0)
最终输出应为 all_features.shape = (2919, 331)
而在您编写的代码中,此句变为
all_features = all_features.fillna(0)
最终输出为all_features.shape = (2919, 354),
初次接触pandas,不知这个是您的疏忽还是由于我对代码的理解不充分,望解答,十分感谢。

版本信息
pytorch:
torchvision:
torchtext:
...

@huiget
Copy link

huiget commented Nov 4, 2019

同意你的看法,不选择列确实会产生问题,例子如下。

In [34]: df = pd.DataFrame({'a': [1, 2, None, 3], 'b': ['a', 'b', None, '0']})

In [35]: pd.get_dummies(df, dummy_na=True)
Out[35]:
     a  b_0  b_a  b_b  b_nan
0  1.0    0    1    0      0
1  2.0    0    0    1      0
2  NaN    0    0    0      1
3  3.0    1    0    0      0

In [36]: df_new = df.fillna(0)

In [37]: pd.get_dummies(df_new, dummy_na=True)
Out[37]:
     a  b_0  b_0  b_a  b_b  b_nan
0  1.0    0    0    1    0      0
1  2.0    0    0    0    1      0
2  0.0    1    0    0    0      0
3  3.0    0    1    0    0      0

查了下原书,是这样写的:

all_features[numeric_features] = all_features[numeric_features].fillna(0)

@ShusenTang
Copy link
Owner

感谢提醒,已更正

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants