Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification regarding data normalization #6

Open
JiahuiSophieHU opened this issue May 25, 2022 · 0 comments
Open

Clarification regarding data normalization #6

JiahuiSophieHU opened this issue May 25, 2022 · 0 comments

Comments

@JiahuiSophieHU
Copy link

Hello,

I was trying to run N-HiTS with my own data using the shared colab

I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

def normalize(df_csv, df_train):
result = df_csv.copy()
columns_names = list(df_csv.columns)
for feature_name in columns_names[1:]:
result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std()
return result

My function return different result comparing to yours:
date HUFL
2016-07-01 00:00:00 0.126520
2016-07-01 00:15:00 -0.023339
2016-07-01 00:30:00 -0.098268
2016-07-01 00:45:00 -0.431177
2016-07-01 01:00:00 -0.231432
Name: HUFL, dtype: float64

and yours:
unique_id | ds | y
HUFL | 2016-07-01 00:00:00 | -0.041413
HUFL | 2016-07-01 00:15:00 | -0.185467
HUFL | 2016-07-01 00:30:00 | -0.257495
HUFL | 2016-07-01 00:45:00 | -0.577510
HUFL | 2016-07-01 01:00:00 | -0.385501

Can you please tell me more about the data normalization process?

Thanks and regards,

Sophie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant