This document helps you start writing programs to run on the Angel-PS architecture. The following know-hows are expected before you start:
- Basic Scala or Java programming
- Basics of vector, matrix, and tensor: definitions and operations
- Good understanding of machine learning algorithms -- however, if you don't have it yet, use this document as a start
Let's first review some basics:
-
Most machine learning algorithms break down to operations of vectors, matrices, tensors. These mathematical forms are also used to represent data.
-
Angel-PS implements matrix operations on the parameter server (PS). It abstracts the parameter matrix, which is distributed on multiple PS servers, into a PSModel. Defining your PSModel and its calculation suffices a basic algorithm to run on the PS.
A simple representation of the Angel-PS architecture is shown below:
-
PS consists of multiple machines that store the parameter matrix, executing pull/push operations between nodes and updating
-
Each worker is a logical compute node and can run one or more tasks
Training algorithms in machine learning are usually implemented in an iterative fashion; in each iteration, the worker pulls the latest parameters from the PS, updates their values, and pushes to the PS
This example guides you through the implementation of a simple Logistic Regression algorithm. The code can be found in example.quickStart.
Let w denote the 1-by-N parameter matrix (an N-dimensional vector), where N is the number of weights in the LR model.
We train the LR model with gradient descent algorithm. In each iteration, task pulls the latest model w from the PS, calculates the change in gradient △w, and pushes △w to the PS. We need the following three steps to realize the procedure:
-
Define the model(MLModel)
Define a
LRModel
class that inherits theMLModel
class, add an N-dimensionalPSModel
toLRModel
using theaddPSModel
method, and set the save path for the LR model using thesetSavePath
method.The value of N, save path, etc. can be configured through
conf
.class QSLRModel(ctx: TaskContext, conf: Configuration) extends MLModel(ctx){ val N = conf.getInt(MLConf.ML_FEATURE_NUM, MLConf.DEFAULT_ML_FEATURE_NUM) val weight = PSModel[DenseDoubleVector]("qs.lr.weight", 1, N).setAverage(true) addPSModel(weight) setSavePath(conf) setLoadPath(conf) }
-
Define the Task(TrainTask)
Angel's model training is done through tasks. We need to define
LRTrainTask
to train the LR model.LRTrainTask
needs to inherit theTrainTask
class and implement the following two methods:- Data Parsing
Before training, each line of text in the input is parsed into a training datum, implemented in the
parse
method. Here, we useDataParser
to parse the data in dummy format.override def parse(key: LongWritable, value: Text): LabeledData = { DataParser.parseVector(key, value, feaNum, "dummy", negY = true) }
You can access the preprocessed data through the task's
dataBlock
.- Training
Angel will automatically run the
train
method in anyTrainTask
subclass. We need to implement thetrain
method forLRTrainTask
.In this simple LR example:
-
We create an instance of the
QSLRModel
class, and then start the iterations. -
In each iteration:
- Task pulls the weight parameters from the PS
- Workers calculate gradient
grad
and push it to the PS; PS then automatically updates the weight parameters - Call
clock()
,incIteration()
after pushinggrad
def train(ctx: TaskContext): Unit = { // A simple logistic regression model val lrModel = new QSLRModel(conf, ctx) val weightOnPS = lrModel.weight // Apply batch gradient descent LR iteratively while (ctx.getIteration < epochNum) { // Get model and calculate gradient val weight = weightOnPS.getRow(0) val grad = batchGradientDescent(weight) // Push gradient vector to PS Server and clock weightOnPS.increment(grad.timesBy(-1.0 * lr)) weightOnPS.clock.get // Increase iteration number ctx.incIteration() } }
-
Define the Runner(MLRunner)
We have already defined the LR model and implemented its training method. The next step is to implement the
Runner
class to submit the training task to the cluster.Define
myLRRunner
class that inheritsMLRunner
, implement thetrain
method to submitmyLRModel
class andmyLRTrainTask
class.
class myLRRunner extends MLRunner{
……
override
def train(conf: Configuration): Unit = {
train(conf, QSLRModel(conf), classOf[QSLRTrainTask])
}
}
You can submit the application to Yarn using the sample command below:
./bin/angel-submit \
--action.type train \
--angel.app.submit.class com.tencent.angel.example.quickstart.QSLRRunner \
--angel.train.data.path $input_path \
--angel.save.model.path $model_path \
--ml.epoch.num 10 \
--ml.feature.num 10000 \
--ml.data.type dummy \
--ml.learn.rate 0.001 \
--angel.workergroup.number 3 \
--angel.worker.memory.mb 8000 \
--angel.worker.task.number 3 \
--angel.ps.number 1 \
--angel.ps.memory.mb 5000 \
--angel.job.name myLR
After submission, follow Running on Yarn if you are not yet familiar with Yarn.
OK. You have just completed a simple Angel job. Want to write more complex machine learning algorithms? Read the complete Angel Programming Guide.
Welcome to Angel's world.