Skip to content

Latest commit

 

History

History
38 lines (23 loc) · 3.42 KB

README.md

File metadata and controls

38 lines (23 loc) · 3.42 KB

Fast Dawid-Skene algorithm in Julia for Turing.jl (Google Summer of Code 2022)

This repository contains the work done for a project of implementing in Julia an expectation maximization version of the Fast Dawid-Skene algorithm (paper) for voting aggregation. The core of the algorithm (in dawidskene.jl) is mostly a translation of the Python implementation published by the authors of that paper.

mixturemodels.jl contains EM implementations of two clustering algorithms which was done as an exercise.

The work was done during the summer of 2022 under Kai Xu's mentorship. (Thanks!)

Project scope

Link to the project proposal.

I managed to translate the Python code into Julia and tested it at RTE and Adult2 datasets (achieving almost the same scores as those reported in the paper). I didn't succeed at integrating it with Turing by using the @model macro.

However, @model can be used to implement expectation-maximization algorithms as can be seen in the em-gmm notebook. The other notebook (em-fds) is a record of my attempt at implementing EM-FDS with @model, details problems I've encountered and should be a good place to pick up if anybody would like to.

Scripts

I suggest running them using Julia REPL, e.g. in VSCode. Alternatively, you can run them from the console (but the former solution is preferrable).

julia scripts/test_dawidskene.jl
julia scripts/test_mixturemodels.jl
  • test_dawidskene.jl - runs the three variants of the algorithm (Fast Dawid-Skene, normal Dawid-Skene, and Hybrid Dawid-Skene) as well as majority voting algorithm, comparing their time performance (using @btime from BenchmarkTools) and result (average negative log-likelihood and mutual information).
    • One can see that time performance and neg-log-likelihoods are similar to those published in the paper (tables 1 and 2; see also the showcase notebook).
  • test_mixturemodels.jl - runs and compares EM implementations of two clustering algorithms, k-means and gaussian mixture models (also with @btime).

Tests

  • runtests.jl - uses the standard library Test module to validate that the algorithms work correctly.

Jupyter notebooks