The market of cryptocurrencies presents great investment opportunities accompanied by both high potential returns and high risks. It is a new market that is characterized by high volatility and complex factors affecting the path of crypto making the forecasting methods a real challenge for investors and financial analysts. In this paper, we combine the results from the five most used methods of classification in deep learning to highlight the most efficient one in predicting the sign of crypto return. The analysis is based on three cryptocurrencies: Bitcoin, the leader of the market, Ethereum, the second-placed cryptocurrency in respect to market capitalization, and Eos, in the top twenty. The last was taken to test the algorithm accuracy not only on the market giants. Furthermore, for each of the algorithms, we used the same database, Binance, which makes it possible to compare them and to conduct a general conclusion about the adding value of their complexity. In the study, the impact of external social factors such as most popular Twitter posts is considered in addition to the historical trading information about the crypto market. The results of the research show that the accuracy of crypto return predictions is estimated medium to low. In addition, we concluded that simple classification algorithms such as Random Forest could perform as efficiently as the complex algorithm Long Short-Term Memory (LSTM) taking into consideration 28 factors.
- Ruben Kempter : [email protected]
- Dimitri André : [email protected]
- Guillaume Pavé : [email protected]
- Clone project
git clone https://github.com/GuillaumePv/data-analysis-crypto.git
- Go into project folder
cd data-analysis-crypto
- Create your virtual environment
python -m venv venv
- Enter in your virtual environment
- Mac OS / linux
source venv/bin/activate venv venv
- Windows
.\venv\Scripts\activate
- Install libraries
- Python 2
cd twint
pip install . -r requirements.txt
cd ..
pip install -r requirements.txt
- Python 3
cd twint
pip3 install . -r requirements.txt
cd ..
pip3 install -r requirements.txt
- using our makefile to run our project
WARNING: If you want to reproduce our results in the report DONT run "make run" or "make fetch" this will update the database. Run "make models" only.
- see helper of the makefile
make
- run the whole project: fetch/process/run
make run
- fetch the data
make fetch
- process the data
make process
- run models on current data
make models
https://github.com/hbast/pyTree
├── README.md <- The top-level README for developers using this project.
│
├── Makefile <- makefile to run project or each part of project
│
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
│
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── scripts <- Source code for use in this project.
│ │
│ ├── fetchScripts <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── processScripts <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── NN_models.py <- Librairy to create and analyze LSTM and Conv1D_LSTM models
│ │
│ ├── data.py <- Library to process our final dataset to provide features for our LSTM and Conv1D-LSTM
│ │
│ ├── models.py <- Script to train models and then use trained models to make
│ │
│ ├── descript_stats.py <- Script to create Decriptive statistics of our dataset
│ │
│ ├── getData.py <- Script to create, merge and process our datasets
│ │
│ └── visualization.py <- Script to create exploratory and results oriented visualizations
│
- obtain twitter data
- data macro => S&P, VIX, GVZ (gold index CBOE) => Yahoo finance
- make descriptive statistics
- create makefile to run all project
- do classification of model performance
- correct excel name
- create a structure tree for the report