Skip to content

zmwaris1/Analytics_Vidhya_Job_A_Thon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analytics_Vidhya_Job_A_Thon

This repository contains the solution file and dataset for analytics vidhya JOB-A-THON. Here you will find the notebook which has the method I used during the event to get the best solution possible.

Problem Statement:

In this contest the dataset has 9 years of data for a country named Green Energy for green energy consumption in the country per hour. The problem statement was that using this data for training purposes we have to build a Machine Learning model which could predict for 3 years consumption of green energy in the future for every hour.

Dataset:

The dataset contains following columns:

  1. datetime
  2. energy

The dataset has no null values.

EDA and Feature Engineering:

  1. Upon plotting the values it is seen that the energy consumption shows a positive growth for increasing years in a linear manner.
  2. The dataset also has some outliers so they are removed.
  3. For prediction I needed more features which could fit the 3 years prediction horizon. So I added more features in our dataset according to seasonality.
  4. For that I made 8 different columns with a lag of 1 year each in every column based on energy.
  5. Using the datetime column I extracted different values like 'year', 'day', 'dayofweek', 'month', 'weekofyear' etc. and added these features to the dataset.
  6. The I used timeseriessplit from sklearn to split the time series data as it is sequential data so the data needs to be progressive while selecting training and test set and not random.

Modelling:

For modelling purpoes I used two different algorithms:

  1. RandomForest
  2. XGBoost
    The best result was achieved using XGBoost.

Tools used:

sickit-learnmatplotlibPandas NumpyPython

Conclusion:

Time series data is complex and if needed to be used for prediction one should make sure that sufficient data is present. The horizon for which the prediction is needed to be done must be based on the data available. Time series data is not easy to work with but at the same time it also give too much insight about the trend and pattern based on real world. For time series prediction XGBoost is one of the best algorithm to work with and gives most accurate predictions.

Rank Secured

Secured 18th rank on the leaderboard.

18th rank

Thank you for visiting.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published