In this project, I extracted +40 leading indicators to forecast the UK inflation rate from Twitter tweets from 2018 to 2022 in the UK.
These efforts are part of my work for the Data-Driven Economics course in my master's at Sapienza University.
Here we have all tweets posted in the UK for the period 2018 to 2022 and the task is to extract leading indicators and signals out of these tweets to forecast the UK inflation rate during that period. We did so with very high performance with the proper feature engineering that yield in extracting the proper indicators.
The data is available at the following link.
Data Attributes Description:
- |-- id: integer (nullable = false) (hashed id)
- |-- verb: string (nullable = true) (can be post or share if a RT)
- |-- user: integer (nullable = false) (hashed user id)
- |-- inreplyto: struct (nullable = true) (in reply to)
- |-- link: string (nullable = true)
- |-- link: string (nullable = true)
- |-- body: string (nullable = true) (body of tweet)
- |-- retweetbody: string (nullable = true) (body of retweeted tweet)
- |-- date: string (nullable = true) (date)
- |-- postedtime: string (nullable = true) (date and time)
- |-- retweetcount: long (nullable = true) (number of RT)
- |-- favoritescount: long (nullable = true) (number of likes)
- |-- generator: string (nullable = true) (generator)
- |-- twitter_lang: string (nullable = true) (language)
- |-- year: string (nullable = true) (year)
- |-- userLocation: string (nullable = true) (user location)
- |-- userFriends: long (nullable = true) (user friends)
- |-- userFollowers: long (nullable = true) (user followers)
- |-- userNumTweets: long (nullable = true) (number of tweets)
- |-- userVerified: boolean (nullable = true) (verified)
- |-- userLanguage: string (nullable = true) (user language)
- |-- userBio: string (nullable = true) (user bio)
- |-- country: string (nullable = true) (country)
- |-- countrycode: string (nullable = true) (country code)
- |-- locality: string (nullable = true) (locality)
- |-- region: string (nullable = true) (region)
- |-- subregion: string (nullable = true) (sub region)
- |-- locationname: string (nullable = true) (location name)
Available in the following notebook, everything is divided into sections and sub-section for ease of navigation:
After performing many trials, I have identified two models that can forecast inflation with great performance which is 92.5% and 97.6% better than the baseline of my experiments.
All the forecasting made is by using 48 months of inflation rate for training (equivalent to 4 years) and forecasting 12 months in one shot.
- ARIMA (Baseline), optimal ARIMA Model is (1,1,1)
- Linear Regression
- VAR
Extra: