Flight Delay Prediction: Building a predictive model analyzing flight delay in Indian Airlines by preparing data from scratch using APIs (JSON) web scraping methods. Further engineering the data and using Machine Learning algorithms to predict flight delay time. The trained model can be used to predict the flight delay in new test sets. .
Dataset has been prepared by from scratch by scraping and parsing Indian flight websites (namely Indigo, Air India, SpiceJet, GoAir, AirAsia) using the python package BeautifulSoup.
Size of dataset: (10718, 29)
- Used Date: Date of departure
- From: Deaparture place
- To: Arrival place
- Airline: Name of the Indian airline
- Scehduled Departure: Time of scheduled departure
- Departure: Actual time of departure
- Scheduled Arrival: Time of scheduled arrival
- Arrival: Time of actual arrival
- Distance: Flight distance between departure and arrival point
- Airline Rating: Average airline ratings (as quoted by www.airlineratings.com)
- Weather attributes:
a) weather__hourly__windspeedKmph
b) weather__hourly__precipMM
c) weather__hourly__humidity
d) weather__hourly__visibility
e) weather__hourly__pressure
f) weather__hourly__cloudcover
Dataset.csv was trained on two Machine Learning Models: RandomForestRegressor and XGradientBoost and the python code is stored in FLIGHT DELAY.ipynb and FLIGHT DELAY.py files.
- NumPy
- Pandas
- BeautifulSoup
- Matplotlib
- Scikit-learn