layout	title
default	Content Recommendation Guide

Movielens 100K Worked Example

A worked step-by-step example using the Movielens 100K dataset is provided here. The dataset is in reality small enough that Spark based models are not the best solution but the small size makes it easy to run and test the steps you would use for larger datasets.

Prerequisites
Running
Detailed Steps

Prerequisites

You have installed Seldon on a Kubernetes cluster with Spark included (Spark is included by default).
You haved added seldon-server/kubernetes/bin to you shell PATH environment variable.

Running

The entire set of steps can be executed by running the Kubernetes deployment in kubernetes/conf/examples/ml100k/ml100k-import.json which should have been created when you followed the install and configuration steps.

Create the kubernetes job:

{% highlight bash %} cd kubernetes/conf/examples/ml100k kubectl create -f ml100k-import.json {% endhighlight %}

The job may take a few minutes to fully run. You can check its status with kubectl get jobs -l name=ml100k-import.

This job will:

Download the movielens 100k data. (N.B. make sure your Kubernetes cluster has access to the internet)
Create a new client "ml100k", create the item meta-data schema and import user, and items.
Create a historical actions dataset from the movie ratings
Run a matrix factorization Spark job to create a model
Setup a matrix factorization runtime scorer

You can then test the recommendations by doing:

{% highlight bash %} seldon-cli api --client-name ml100k --endpoint /js/recommendations --item 50 --limit 4 {% endhighlight %}

The above gets recommendations based on a recent action history for a user being movie 50 which is "Star Wars". The result should look something like:

{% highlight json %} { "size": 4, "requested": 4, "list": [ { "id": "181", "name": "", "type": 1, "first_action": 1460395228000, "last_action": 1460395228000, "popular": false, "demographics": [], "attributes": {}, "attributesName": { "recommendationUuid": "7", "release": "14-Mar-1997", "title": "Return of the Jedi (1983)", "url": "http://us.imdb.com/M/title-exact?Return%20of%20the%20Jedi%20(1983)" } }, { "id": "1", "name": "", "type": 1, "first_action": 1460395228000, "last_action": 1460395228000, "popular": false, "demographics": [], "attributes": {}, "attributesName": { "recommendationUuid": "7", "release": "01-Jan-1995", "title": "Toy Story (1995)", "url": "http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)" } }, { "id": "121", "name": "", "type": 1, "first_action": 1460395228000, "last_action": 1460395228000, "popular": false, "demographics": [], "attributes": {}, "attributesName": { "recommendationUuid": "7", "release": "03-Jul-1996", "title": "Independence Day (ID4) (1996)", "url": "http://us.imdb.com/M/title-exact?Independence%20Day%20(1996)" } }, { "id": "222", "name": "", "type": 1, "first_action": 1460395228000, "last_action": 1460395228000, "popular": false, "demographics": [], "attributes": {}, "attributesName": { "recommendationUuid": "7", "release": "22-Nov-1996", "title": "Star Trek: First Contact (1996)", "url": "http://us.imdb.com/M/title-exact?Star%20Trek:%20First%20Contact%20(1996)" } } ] } } {% endhighlight %}

Detailed Steps

Below are the deatiled steps which can be found here.

Download Data

Hack to ensure we have a namesever for external DNS (seems to be required for local Docker running of Kubernetes)
Download and unzip Movielens data
Convert to UTF-8 the item meta-data

{% highlight bash %} echo "nameserver 8.8.8.8" >> /etc/resolv.conf wget http://files.grouplens.org/datasets/movielens/ml-100k.zip unzip ml-100k.zip iconv -f iso-8859-1 -t utf-8 ml-100k/u.item -o ml-100k/u.item.utf8 {% endhighlight %}

Create Historical Data Files

Create item, user and action CSV files from raw data

{% highlight bash %} echo "create items csv" cat <(echo 'id,title,release,url') <(cat ml-100k/u.item.utf8 | awk -F '|' '{printf("%d,"%s","%s","%s"\n",$1,$2,$3,$5)}') > items.csv

echo "create users csv"
cat <(echo "id") <(cat ml-100k/u.user | cut -d'|' -f1) > users.csv

echo "create actions csv"
cat <(echo "user_id,item_id,value,time") <(cat ml-100k/ua.base | cut -f1,2,3,4 --output-delimiter=,) > actions.csv

{% endhighlight %}

Create Client

We will use a item schema to hold the title, release data and IMDB URL of the movies, as show below

{% highlight json %} { "types": [ { "type_attrs": [ { "name": "title", "value_type": "string" }, { "name": "url", "value_type": "string" }, { "name": "release", "value_type": "string" } ], "type_id": 1, "type_name": "Movies" } ] }

{% endhighlight %}

The steps to setup and import the data can be done via the Seldon CLI

Create a new ml100k client
Setup the schema using above JSON
Import the users and items as defined by CSV files
Create actions file from CSV file

{% highlight bash %}

seldon-cli client --action setup --client-name ml100k --db-name ClientDB
kubectl cp attr.json /seldon-control:/tmp/attr.json && seldon-cli attr --action apply --client-name ml100k --json /tmp/attr.json
kubectl cp items.csv /seldon-control:/tmp/items.csv && seldon-cli import --action items --client-name ml100k --file-path /tmp/items.csv
kubectl cp users.csv /seldon-control:/tmp/users.csv && seldon-cli import --action users --client-name ml100k --file-path /tmp/users.csv
kubectl cp actions.csv /seldon-control:/tmp/actions.csv && seldon-cli import --action actions --client-name ml100k --file-path /tmp/actions.csv

{% endhighlight %}

Build a Recommendation Model

We will build a matrix factorization model using Spark via luigi.

{% highlight bash %} luigi --module seldon.luigi.spark SeldonMatrixFactorization --local-schedule --client ml100k --startDay 1 {% endhighlight %}

Setup Runtime Scorer

We setup a runtime matrix factorization scorer that takes recent activity for users to make recommendations.

{% highlight bash %} cat <<EOF | seldon-cli rec_alg --action create --client-name ml100k -f - { "defaultStrategy": { "algorithms": [ { "config": [ { "name": "io.seldon.algorithm.general.numrecentactionstouse", "value": "1" } ], "filters": [], "includers": [], "name": "recentMfRecommender" } ], "combiner": "firstSuccessfulCombiner" }, "recTagToStrategy": {} } EOF

seldon-cli rec_alg --action commit --client-name ml100k {% endhighlight %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml100k.md

ml100k.md

Movielens 100K Worked Example

Prerequisites

Running

Detailed Steps

Download Data

Create Historical Data Files

Create Client

Build a Recommendation Model

Setup Runtime Scorer

Files

ml100k.md

Latest commit

History

ml100k.md

File metadata and controls

Movielens 100K Worked Example

Prerequisites

Running

Detailed Steps

Download Data

Create Historical Data Files

Create Client

Build a Recommendation Model

Setup Runtime Scorer