The aim of this project is to create a machine learning model based on Active Learning to identify and filter out spam in YouTube comments.
- Modal and sckit learn -> pip install modAL scikit-learn matplotlib -qqq # matplotlib is optional
- Rubrix framework -> pip install "rubrix[server]==0.18.0"
- Install docker desktop
- docker run -d --name elasticsearch-for-rubrix -p 9200:9200 -p 9300:9300 -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
- python -m rubrix
- Open a terminal in the source directory
- python divide.py (not necessary because the divide data sets are already saved in the data directory)
- python main.py
- Annotate the comments in the rubrix link (http://0.0.0.0:6900 or http://localhost:6900) given (username: rubrix, password: 1234)
- python train.py