A simple Self Organizing Map for Scala and Apache Spark.
Make sure you have an implicit SparkSession
and your data RDD ready.
implicit val sparkSession = ???
val data: RDD[Vector] = ???
Compose your own SOM instance, with either predefined or custom implementations of decay functions, neighborhood kernels or error metrics...
val SOM = new SelfOrganizingMap with CustomDecay with GaussianNeighborboodKernel with QuantizationErrorMetrics {
override val shape: Shape = (24, 24)
override val learningRate: Double = 0.3
override val sigma: Double = 0.5
}
... or just use an off-the-shelf SOM for your convenience.
val SOM = GaussianSelfOrganizingMap(24, 24, sigma = 0.5, learningRate = 0.3)
Initialization and training:
val (som, params) = SOM.initialize(data).train(data, 20)
Classification of datapoints:
val dataPoint: DenseVector = ???
val (bmu, distance) = som.classify(dataPoint)
You can find more examples using the SOM library in the tests and complete applications in the examples
directory.
➜ som git:(master) ✗ sbt
...
> publishLocal
...
> project macros
...
> publishLocal
...
[success] Total time: 10 s, completed Dec 18, 2016 4:02:19 PM
>
Some parts of the implementation are inspired by the spark-som project. Credits to @jxieeducation / PragmaticLab.