Skip to content

Matrix API Reference

flavianv edited this page Aug 14, 2012 · 45 revisions

Matrix library functions can be divided into three types:

In addition, matrix library contains

Value operations work over individual values in a matrix, usually transforming them in some way.

# mat.mapValues( mapping function ): Matrix

Maps the values of the matrix to new values using a function

// The new matrix contains the elements from the original matrix multiplied by 2
val squaredMatrix = matrix.mapValues{ value : Int => value * 2 }

# mat.filterValues( filter function ) : Matrix

Keep only the values of the matrix that set the function to true

// The new matrix contains only the positive elements from the matrix
val filteredMatrix = matrix.filterValues{ _ > 0 }

# mat.binarizeAs[NewValT] : Matrix

Sets all of the non-zero values of the matrix to the one element of the type

// The new matrix contains ones as integers for all non-zero values from the matrix
val binMatrix = matrix.binarizeAs[Int]

RowVector operations

# mat.getRow( rowNumber ) : RowVector

Returns the row indexed with the specified value from the original matrix

// Returns the 3rd row from the matrix
val row = matrix.getRow(3)
// Returns the row from the matrix that is indexed with “France”
val row = matrix.getRow("France")

# matrix.reduceRowVectors{ reduce function } : RowVector

Reduces all row vectors into a single row vector using a associative pairwise aggregation function

// Produces a row vector containing the products of each column's values
// matrix = 1 1 1
//          2 2 1
//          3 0 1
// rowProd = 6 0 1
val rowProd = matrix.reduceRowVectors { (x,y) => x * y }

# matrix.sumRowVectors : RowVector

Reduces all row vectors into the sum row vector

// Returns the row vector of all per-column sums
val rowSum = matrix.sumRowVectors

# matrix.mapRows{ mapping function } : Matrix

Maps each of the row vectors into a new row vector using a function over the list of non-zero elements in the original rows

// Returns the mean-centered rows of the original matrix
val rowSum = matrix.mapRows{ meanCenter }
def meanCenter[T](vct: Iterable[(T,Double)]) : Iterable[(T,Double)] = {
    val valList = vct.map { _._2 }
    val sum = valList.sum
    val count = valList.size
    val avg = sum / count
    vct.map { tup => (tup._1, tup._2 - avg) }
}

# matrix.topRowElems( numberOfElements ) : Matrix

Returns the matrix containing only the top K greatest elements in each row as specified by the type-specific comparator. The new matrix pipe has the elements sorted in decreasing order per row key.

// Returns the matrix with top 10 elements
// TODO: add example
val topkMatrix = matrix.topRowElems(10)

# matrix.rowL1Normalize : Matrix

Returns the matrix containing the L1 row-normalized elements of the original matrix

// Returns the adjacency matrix of the follow graph normalized by outdegree
// matrix = 1 0 1
//          1 1 1
//          1 0 0
// matrixL1Norm = 0.5 0 0.5
//                0.33 0.33 0.33
//                1 0 0
val matrixL1Norm = matrix.rowL1Normalize

# matrix.rowL2Normalize : Matrix

Returns the matrix containing the L2 row-normalized elements of the original matrix

// Returns the adjacency matrix of the follow graph normalized by outdegree
// TODO: add example
val matrixL2Norm = matrix.rowL2Normalize

# matrix.rowMeanCentering : Matrix

Returns the matrix containing the row-mean centered elements of the original matrix, by computing the mean per each row and substracting it from the row elements, such that the original mean value is now zero. (the code is shown above in the mapRows example)

// Substracts all of the row-wise means from all the elements
val matrixCentered = matrix.rowMeanCentering

# matrix.rowSizeAveStdev : Matrix

Computes the row size, average and standard deviation returning a new kx3 matrix where each row is mapped to the 3 values.

// Computes the row stats
// TODO: add example
val matrixRowStats = matrix.rowSizeAveStdev
def rowColValSymbols : List[Symbol] = List(rowSym, colSym, valSym)

ColVector operations

Column vector operations are similar to the row vector ones, where all function names are renamed from row to col and the return type is in general ColVector.

# matrix1 * matrix2 : Matrix

Computes the product of two matrices

// Computes the product of two matrices
val matrixProd = matrix1 * matrix2

# matrix / scalar(LiteralScalar) : Matrix

Computes the element-wise division of a matrix by a scalar

// Computes the element-wise division of a matrix by 100
val matrixDiv = matrix / 100

# matrix1.elemWiseOp( matrix2 ){ function }

Computes the element-wise merging of two matrices

// Computes the element-wise division of a matrix by another 
matrix1.elemWiseOp(matrix2)((x,y) => x/y)

# matrix1 + matrix2 : Matrix

Computes the sum of two matrices

// Computes the sum of two matrices
matrixSum = matrix1 + matrix2

# matrix1 - matrix2 : Matrix

Computes the difference between two matrices

// Computes the difference between two matrices
matrixDiff = matrix1 - matrix2

# matrix1.hProd( matrix2 ) : Matrix

Computes the element-wise product of two matrices

// Computes the element-wise product of two matrices
matrixProd = matrix1.hProd( matrix2 )

# matrix1.zip( matrix2/row/column ) : Matrix

Merges the elements of the two matrices creating a matrix that has as values the pair tuples. Similarly, when zipping a matrix with a row or a vector, it zips the values of the row/column across all of the rows/columns of the matrix. The resulting value type is a pair tuple that has defined a Monoid trait.

// Returns the matrix with elements of the type ( 0, elem2), ( elem1, 0) and (elem1, elem2)
// matrix = 0 1 1
//          1 2 0
// rowVct = 1 0 1
// matrixVctPairs = (0, 1) (1, 0) (1, 1)
//                  (1, 1) (2, 0) (0, 1)
matrixPairs = matrix1.zip( matrix2 )
matrixVctPairs = matrix.zip ( rowVct )

# matrix.nonZerosWith( Scalar )

Zips the scalar on the right with all non-zeros in this matrix

// Similar to zip, but combine the scalar on the right with all non-zeros in this matrix:
matrixProd = matrix1.nonZerosWith( 100 )

# matrix.trace : Scalar

Computes the trace of a matrix that is equal to the sum of the diagonal values of the matrix.

// Computes the trace of a matrix
trace = matrix.trace

# matrix.sum : Scalar

Computes the sum of the elements of a matrix

// Computes the sum of the elements of a matrix
sum = matrix.sum

# matrix.transpose : Matrix

Computes the transpose of the matrix

// Computes the transpose of the matrix
matrixTranspose = matrix.transpose

# matrix.diagonal : DiagonalMatrix

Returns the diagonal of the matrix

// Returns the diagonal of the matrix
matrixDiag = matrix.diagonal

PipeExtensions

# pipe.toMatrix( fields ) : Matrix

Constructs a matrix from a pipe.

// The matrix will contain all of the data from the pipe and will have the names of the row, column and value dimensions ‘row, ‘col and ‘val. Again, the row and column types do not have to be numeric.
val matrix = pipe.toMatrix(‘row,’col,’val)

# pipe.mapToMatrix( fields ) { mapping function } : Matrix

Constructs a matrix from a pipe by applying first a mapping function

// The new pipe contains all of the triples data in matrix with squared values
val matrix = pipe.mapToMatrix(‘a, ‘b, ‘c){ (x, y, z) => (x, y, z * z) }

# pipe.flatMapToMatrix( fields ) { mapping function } : Matrix

Constructs a matrix from a pipe by applying first a mapping function

// The new pipe contains all of the triples data in matrix with log-values
val matrix = pipe.flatMapToMatrix( ‘a, ‘b, ‘c){ (x, y, z) => (x, y, scala.math.log(z)) }

Companion object methods

# Matrix.readTSV( filename ) : Matrix

Loads a matrix from a TSV file that contains a (row, column, element) triple per line.

// The matrix contains the data from the TSV file
val newMatrix = Matrix.readTSV( "input" )

# matrix.pipe : RichPipe

Returns the pipe associated with the matrix. The pipe contains tuples of the three (row,column,value) elements that can be accessed either using the field names or by using the Matrix class variables matrix.rowSym, matrix.colSym, matrix.valSym.

// The new pipe contains all of the triples data in matrix
val newPipe = matrix.pipe

# matrix.pipeAs( toFields ) : RichPipe

Returns the matrix pipe with the fields renamed

// The new pipe contains all of the triples data in matrix renamed to ('user,'movie,'similarity)
val pipeFields = matrix.pipeAs( ‘user, ‘movie, ‘similarity)

# matrix.write( src, outFields ) : Matrix

Writes to a sink and returns the matrix data for further processing.

// Writes to a sink and returns the matrix data for further processing.
matrix.write( Tsv("output") )

# matrix.fields : List[Symbol]

Returns the list of the fields of the pipe associated with the matrix: matrix.rowSym, matrix.colSym, matrix.valSym.

// The new pipe contains all of the triples data in matrix
val pipeFields = matrix.fields

# matrix.withSizeHint : Matrix

Adds a SizeHint to the matrix. The size hint specifies the approximate value of the dimensions of the matrix and impacts the type of joins that. If set to true, the skewness flag triggers an additional split of the data over a set of random keys to evenly distribute the computation.

// The new matrix has a new SizeHint specifying that there are around 1000 rows, 4000 columns, 1000 non-zeroes and that there is a skew of the values over the indexes.
val newMatrix = matrix.withSizeHint( 1000, 4000, 1000, true )

# matrix.hasHint : SizeHint

Returns true if the matrix has a size hint.

// Check if the matrix has a sizeHint set.
if (!matrix.hasHint) matrix.withSizeHint(newHint)

Contents

Getting help

Documentation

Matrix API

Third Party Modules

Videos

How-tos

Tutorials

Articles

Other

Clone this wiki locally