layout	title
default	Offline Recommendation Model Creation

Content Recommendation Steps

concepts --> setup server --> logging --> configure data --> realtime activity --> offline model --> runtime configuration --> microservices --> recommendations

Offline Recommendation Model Creation

Seldon provides a variety of item recommendation models that can be created and makes it easy for new custom models to be added.

The current integrated models are:

Models created via Apache Spark
Models created via Semantic Vectors

Confguration is either passed on the command line to the offline jobs or set in zookeeper.

Offline Data Store

The Seldon modelling and data manipulation jobs assume a structure for the data storage. This structure allows easy integration into a production environment where models are created periodically, usually each day. The directory structure is of the form {% highlight bash %} seldon-models/${CLIENT}/${MODEL}/${DAY} {% endhighlight %}

e.g. for a matrix_factorization model created for client client1 on 27 Jan 2014 (unix epoch day 16461) would be

{% highlight bash %} seldon-models/client1/matrix_factorization/16461 {% endhighlight %}

You can use a network file store, AWS S3 or soon HDFS for the actual store.

The jobs that require activity data will use a start day and a number of days to collect from the filesystem the data they need. They will gather data from folders of the form:

{% highlight bash %} ${input-path}/${client}/actions/start-day ${input-path}/${client}/actions/start-day-1 ${input-path}/${client}/actions/start-day-2 . . ${input-path}/${client}/actions/start-day-(num-days) {% endhighlight %}

For example:

{% highlight bash %} /seldon-models/client1/actions/16461 /seldon-models/client1/actions/16460 /seldon-models/client1/actions/16459 {% endhighlight %}

The output path will be of the form:

{% highlight bash %} ${output-path}/${client}/${model}/start-day {% endhighlight %}

For example:

{% highlight bash %} s3://seldon-models/client1/matrix-factorization/16461 {% endhighlight %}

Configuration

Configuration is held in zookeeper as JSON in nodes of the form:

{% highlight bash %} //offline/ {% endhighlight %}

For example:

{% highlight bash %} /clientname1/offline/similar-items {% endhighlight %}

All jobs usually have a set of basic parameters they need including

inputFolder : the base folder on the local file system, S3 or HDFS of the data needed for the job
outputFolder : the base folder on the local file system, S3 or HDFS where the output will be stored
startDay : the day as unix epoch day number to start from
days : the number of days to go back from startDay (inclusive) to collect data as input
awskey : AWS key (only needed if using S3 for storage)
awssecret : AWS secret (only needed if using S3 storage)
itemType : restrict activity data to only these types of items (-1 is allow all)
activate : whether to activate the model immediately in the Seldon Server so predictions can be provided

An example:

{% highlight json %} { "inputPath":"/seldon-models", "outputPath":"/seldon-models", "startDay" : 1, "days" : 1, "itemType":-1, "activate":true } {% endhighlight %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

offline-models.md

offline-models.md

Content Recommendation Steps

Offline Recommendation Model Creation

Offline Data Store

Configuration

Files

offline-models.md

Latest commit

History

offline-models.md

File metadata and controls

Content Recommendation Steps

Offline Recommendation Model Creation

Offline Data Store

Configuration