Fast Dawid-Skene algorithm in Julia for Turing.jl (Google Summer of Code 2022)
This repository contains the work done for a project of implementing in Julia an expectation maximization version of the Fast Dawid-Skene algorithm (paper) for voting aggregation. The core of the algorithm (in dawidskene.jl) is mostly a translation of the Python implementation published by the authors of that paper.
mixturemodels.jl contains EM implementations of two clustering algorithms which was done as an exercise.
The work was done during the summer of 2022 under Kai Xu's mentorship. (Thanks!)
I managed to translate the Python code into Julia and tested it at RTE and Adult2 datasets (achieving almost the same scores as those reported in the paper). I didn't succeed at integrating it with Turing by using the @model
macro.
However, @model
can be used to implement expectation-maximization algorithms as can be seen in the em-gmm
notebook. The other notebook (em-fds) is a record of my attempt at implementing EM-FDS with @model
, details problems I've encountered and should be a good place to pick up if anybody would like to.
I suggest running them using Julia REPL, e.g. in VSCode. Alternatively, you can run them from the console (but the former solution is preferrable).
julia scripts/test_dawidskene.jl
julia scripts/test_mixturemodels.jl
- test_dawidskene.jl - runs the three variants of the algorithm (Fast Dawid-Skene, normal Dawid-Skene, and Hybrid Dawid-Skene) as well as majority voting algorithm, comparing their time performance (using
@btime
from BenchmarkTools) and result (average negative log-likelihood and mutual information).- One can see that time performance and neg-log-likelihoods are similar to those published in the paper (tables 1 and 2; see also the showcase notebook).
- test_mixturemodels.jl - runs and compares EM implementations of two clustering algorithms, k-means and gaussian mixture models (also with
@btime
).
- runtests.jl - uses the standard library
Test
module to validate that the algorithms work correctly.
- showcase.ipynb - explains this implementation of EM-FDS step-by-step.
- em-gmm.ipynb - implements the gaussian mixture model with Turing's
@model
macro for defining statistical models. - em-fds.ipynb - details my attempt at implementing EM-FDS with
@model
.