Fast Dawid-Skene algorithm in Julia for Turing.jl (Google Summer of Code 2022)

This repository contains the work done for a project of implementing in Julia an expectation maximization version of the Fast Dawid-Skene algorithm (paper) for voting aggregation. The core of the algorithm (in dawidskene.jl) is mostly a translation of the Python implementation published by the authors of that paper.

mixturemodels.jl contains EM implementations of two clustering algorithms which was done as an exercise.

The work was done during the summer of 2022 under Kai Xu's mentorship. (Thanks!)

Project scope

Link to the project proposal.

I managed to translate the Python code into Julia and tested it at RTE and Adult2 datasets (achieving almost the same scores as those reported in the paper). I didn't succeed at integrating it with Turing by using the @model macro.

However, @model can be used to implement expectation-maximization algorithms as can be seen in the em-gmm notebook. The other notebook (em-fds) is a record of my attempt at implementing EM-FDS with @model, details problems I've encountered and should be a good place to pick up if anybody would like to.

Scripts

I suggest running them using Julia REPL, e.g. in VSCode. Alternatively, you can run them from the console (but the former solution is preferrable).

julia scripts/test_dawidskene.jl
julia scripts/test_mixturemodels.jl

test_dawidskene.jl - runs the three variants of the algorithm (Fast Dawid-Skene, normal Dawid-Skene, and Hybrid Dawid-Skene) as well as majority voting algorithm, comparing their time performance (using @btime from BenchmarkTools) and result (average negative log-likelihood and mutual information).
- One can see that time performance and neg-log-likelihoods are similar to those published in the paper (tables 1 and 2; see also the showcase notebook).
test_mixturemodels.jl - runs and compares EM implementations of two clustering algorithms, k-means and gaussian mixture models (also with @btime).

Tests

runtests.jl - uses the standard library Test module to validate that the algorithms work correctly.

Jupyter notebooks

showcase.ipynb - explains this implementation of EM-FDS step-by-step.
em-gmm.ipynb - implements the gaussian mixture model with Turing's @model macro for defining statistical models.
em-fds.ipynb - details my attempt at implementing EM-FDS with @model.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
images		images
notebooks		notebooks
scripts		scripts
src/DawidSkeneAlgorithms		src/DawidSkeneAlgorithms
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
intro.jl		intro.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Dawid-Skene algorithm in Julia for Turing.jl (Google Summer of Code 2022)

Project scope

Scripts

Tests

Jupyter notebooks

About

Releases

Packages

Languages

License

MatthewBaggins/DawidSkeneAlgorithms

Folders and files

Latest commit

History

Repository files navigation

Fast Dawid-Skene algorithm in Julia for Turing.jl (Google Summer of Code 2022)

Project scope

Scripts

Tests

Jupyter notebooks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages