Skip to content

poonampant/recommendation_system

Repository files navigation

recommendation_system

OVERVIEW:

This repository is a collection of movie recommendation based on both collaborative and content based filtering . In this project, we will implement and analyze the performance of collaborative filtering methodsof type: Neighborhood-based collaborative filtering

Neighborhood-based collaborative filtering

The basic idea in neighborhood-based methods is to use either user-user similarity or item-item similarity to make predictions from a ratings matrix. There are two basic principles used in neighborhood-based models:

User-based models: Similar users have similar ratings on the same item. Therefore, if John and Molly have rated movies in a similar way in the past, then one can use John’s observed ratings on the movie Terminator to predict Molly’s rating on this movie.
Item-based models: Similar items are rated in a similar way by the same user. Therefore, John’s ratings on similar science fiction movies like Alien and Predator can be used to predict his rating on Terminator.
In this project, we will only implement item-based collaborative filtering (implementation of user-based collaborative filtering is very similar).

Collaborative: ITEM_BASED filtering is used here: models that help recommend movie suggestions based on other users' ratings, as well as determine how well the recommender engines perform.

Content : models that help recommend movie suggestions based on genere and find similar movies.

Dataset: The data used is from the MovieLens datasets (https://grouplens.org/datasets/movielens/)

Statistical Methods and Models Used:

Pearson's r Cosine similarity Correlation matrix

Pearson-correlation coefficient

Pearson-correlation coefficient between users u and v, denoted by Pearson(u,v), captures the similarity between the rating vectors of users u and v. Before stating the formula for computing Pearson(u,v), let’s first introduce some notation:

Iu : Set of item indices for which ratings have been specified by user u Iv : Set of item indices for which ratings have been specified by user v µu: Mean rating for user u computed using her specified ratings ruk: Rating of user u for item k

k-Nearest neighborhood (k-NN)

Having defined similarity metric between users, now we are ready to define neighborhood of users. k-Nearest neighbor of user u, denoted by Pu, is the set of k users with the highest Pearson-correlation coefficient with user u

Library Used: Pandas Sklearn numpy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published