Clarification regarding data normalization #6

JiahuiSophieHU · 2022-05-25T10:00:34Z

Hello,

I was trying to run N-HiTS with my own data using the shared colab

I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

def normalize(df_csv, df_train):
result = df_csv.copy()
columns_names = list(df_csv.columns)
for feature_name in columns_names[1:]:
result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std()
return result

My function return different result comparing to yours:
date HUFL
2016-07-01 00:00:00 0.126520
2016-07-01 00:15:00 -0.023339
2016-07-01 00:30:00 -0.098268
2016-07-01 00:45:00 -0.431177
2016-07-01 01:00:00 -0.231432
Name: HUFL, dtype: float64

and yours:
unique_id | ds | y
HUFL | 2016-07-01 00:00:00 | -0.041413
HUFL | 2016-07-01 00:15:00 | -0.185467
HUFL | 2016-07-01 00:30:00 | -0.257495
HUFL | 2016-07-01 00:45:00 | -0.577510
HUFL | 2016-07-01 01:00:00 | -0.385501

Can you please tell me more about the data normalization process?

Thanks and regards,

Sophie

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification regarding data normalization #6

Clarification regarding data normalization #6

JiahuiSophieHU commented May 25, 2022

Clarification regarding data normalization #6

Clarification regarding data normalization #6

Comments

JiahuiSophieHU commented May 25, 2022