Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data leakage #2

Open
PietroAmin opened this issue Mar 26, 2020 · 9 comments
Open

data leakage #2

PietroAmin opened this issue Mar 26, 2020 · 9 comments

Comments

@PietroAmin
Copy link

Please note that there is the possibility of data leakage. The way data are standardize is very dangerous because you shift back in time future information. Just try to delete (.shift(-num_historical_days)) in your scaling method and you will see how results will get worser.

@PietroAmin
Copy link
Author

There is the same problem in numerous github codes that try to forecast stock's future prices with GAN.

@nupurdeshpande11
Copy link

There is the same problem in numerous github codes that try to forecast stock's future prices with GAN.

There is the same problem in numerous github codes that try to forecast stock's future prices with GAN.

Cool...I'll check out what you are talking about...what kinda data leakage exactly? Plus the gans are experimental since they haven't been used extensively for time series...

@PietroAmin
Copy link
Author

Just try to visualize the data. When you calculate the moving average, min and max at time (t) and then moving back those informations to "num_historical_days" times before you are anticipating those information. Indeed, if you visualize you will see the moving average always predictig the path of the real time series

@PietroAmin
Copy link
Author

I'm trying to construct the GAN with LSTM as generator and CCN as generator :)

@nupurdeshpande11
Copy link

nupurdeshpande11 commented Mar 26, 2020 via email

@PietroAmin
Copy link
Author

Linear Scaling
x ′ = ( x − x (mean)) / ( x m a x − x m i n )
Just give me some link where they explain why the shift is needed

@nupurdeshpande11
Copy link

nupurdeshpande11 commented Mar 26, 2020 via email

@yanbigong2
Copy link

I'm trying to construct the GAN with LSTM as generator and CCN as generator :)

Have you finished this code? Will you open it?

@deshpandenu
Copy link
Owner

Please note that there is the possibility of data leakage. The way data are standardize is very dangerous because you shift back in time future information. Just try to delete (.shift(-num_historical_days)) in your scaling method and you will see how results will get worser.

Can you please explain this problem by typing in the equation and code you are referring to. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants