Great Anime Recommendation System

This project implements a recommendation system for anime using modern machine learning techniques.

Tech Stacks

Python 3, Flask, Spark, Kafka, Flink, Redis, Tensorflow Keras, Docker, Vue.js, Scrapy, Cassandra

Features

Given a user, return a list of anime recommendations.
Given a user, return a list of similar users.
Given an anime, return a list of similar anime.
React in real-time based on user's click actions and adjust the recommendation list accordingly.

Benchmark

Trained model used in rank-service has about 80% accuracy on the test-dataset predicting if a user will like an anime (rating > 7).

Project Structure

dataset: dataset used in the recommendation system
web-service: frontend of the recommendation system
api-service: backend controller of the recommendation system
recall-service: the first filter of the recommendation system
rank-service: the second filter of the recommendation system
kafka-service: monitor user's click actions and send to Kafka
flink-service: consume user's click actions from Kafka and update new feature data in redis

Demo Images

Details & Deployment

Recall Service

This microservice is the first filter of the recommendation system. It is responsible for return a large list of anime recommendations from the original very large dataset. Because the original dataset is too large, it needs to use simple and fast strategies.

The most important strategy here is the Item2Vec Model, works with locality sensitive hashing algorithm that finds close user embedding and anime embedding. The model input is generated based on Deep Walk algorithm and Min-Max Scaler, and the output is stored in Redis.

Setup

First, make sure Python 3.8 & virtualenv is installed first

virtualenv venv --python=python3

source venv/bin/activate

pip install -r requirements.txt

Set correct path of dataset and ports in /recall/config.py, start.sh, and train.sh

Setup redis (docker is the easiest way) and run redis-server first

Setup correct config in /recall/config.py

source venv/bin/activate

run ./train.sh to generate the word2vec model and store it in redis

Run

source venv/bin/activate

./start.sh

API

GET /?user_id=<uid>
- Get recall for a user
- Returns list of anime ids
GET /sim?anime_id=<aid>
- Get similar items for an anime
- Returns list of anime ids

Check examples in /test dir

Rank Service

This microservice is the second filter of the recommendation system. It is responsible for return a small list of anime recommendations based on returns from recall service.

Collaborative filtering algorithm is implemented here to generate recommendations. While, MLP model shows a better performance and replace collaborative filtering algorithm.

Tensorflow is used to train a MLP model to predict user's like animes, based on user features (top preferred tags, avg rating, etc) and anime features (tags, release date, etc). Hparams and Tensorboard are used to monitor the training process and find the best hyperparameters of MLP. The model has 80% accuracy on the test-dataset predicting if a user will like an anime (rating > 7).

End to end embedding strategy is used to for categorical features. Here's how the model looks like:

Setup

First, make sure Python 3.8 & virtualenv is installed first

virtualenv venv --python=python3

source venv/bin/activate

pip install -r requirements.txt

Set correct info (such as port and data directory) in rank/config.py and start.sh

Make sure Redis is running (install and start redis and redis-server, don't forget to close the firewall) Run Feat Eng.ipynb to store initial newest feature data in redis MLP Model is trained already by MLP.ipynb, so no need to train again

Run

source venv/bin/activate

./start.sh

API

GET /?user_id=<uid>
- Get rank results for a user
- Returns list of anime ids

Check examples in /test dir

Kafka Service

This microservice monitors user's click actions and send to Kafka in real-time.

Setup

First, make sure Python 3.8 & virtualenv is installed first

virtualenv venv --python=python3

source venv/bin/activate

pip install -r requirements.txt

Install Kafka with Zookeeper

need to set up kafka and zookeeper, kafka-docker is one of good options:

docker-compose up -d

docker ps # to check if everything is ok (should have 2 containers running for kafka and zookeeper)

Run

flask run

API

POST /clicks
```
{
  user_id: <uid>, 
  anime_id: <aid>
}
```
- Get rank results for a user
- Returns list of anime ids

Check examples in /test dir

Flink Service

This microservice uses Flink to get new clicks from kafka and write new features to redis in real-time.

Setup

First, make sure Python 3.8 & virtualenv is installed first

virtualenv venv --python=python3

source venv/bin/activate

pip install -r requirements.txt

Copy flink-sql-connector-kafka-1.15.0.jar to venv/lib/python3.8/site-packages/pyflink/lib

Run

source venv/bin/activate

./start.sh

API Service

This microservice is the backend controller for GARS

Setup

First, make sure Python 3.8 & virtualenv is installed first

virtualenv venv --python=python3

source venv/bin/activate

pip install -r requirements.txt

Set correct path of dataset and ports in /recall/config.py and start.sh

Run

source venv/bin/activate

./start.sh

API

GET /?user_id=<uid>
- Get anime recommends for a user
GET /sim?anime_id=<aid>
- Get similar animes for a given one

Check examples in /test dir

Web Service

This microservice is the frontend of the recommendation system.

Setup

npm install

Run

npm run serve

Dataset & Spider

The dataset used in this project is collected from MyAnimeList. It contains each user's rating on different animes, and also all information about each anime.

A Scrapy spider is written to collect the data. You can find it in the /dataset/animelist directory.

run command scrapy crawl anime -o output.csv -t csv to start the spider. Make sure the spider is not running too fast, otherwise you may be banned from scraping.

anime.csv rating.csv are original dataset

merged.csv are left join of original dataset and crawler data for more info about each anime

parsed_anime.csv reformat dates in merged.csv

animelist directory contains a scrapy spider that scrapes anime data from animelist.net

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.idea		.idea
api-service		api-service
dataset		dataset
feature-engineer		feature-engineer
flink-realtime-feature		flink-realtime-feature
kafka-service		kafka-service
rank-service		rank-service
recall-service		recall-service
web-service		web-service
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Great Anime Recommendation System

Tech Stacks

Features

Benchmark

Project Structure

Demo Images

Details & Deployment

Recall Service

Setup

Run

API

Rank Service

Setup

Run

API

Kafka Service

Setup

Install Kafka with Zookeeper

Run

API

Flink Service

Setup

Run

API Service

Setup

Run

API

Web Service

Setup

Run

Dataset & Spider

About

Releases

Packages

Languages

WAcry/GARS

Folders and files

Latest commit

History

Repository files navigation

Great Anime Recommendation System

Tech Stacks

Features

Benchmark

Project Structure

Demo Images

Details & Deployment

Recall Service

Setup

Run

API

Rank Service

Setup

Run

API

Kafka Service

Setup

Install Kafka with Zookeeper

Run

API

Flink Service

Setup

Run

API Service

Setup

Run

API

Web Service

Setup

Run

Dataset & Spider

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages