Skip to content

Implementation of the Viterbi algorithm (EM) for the estimation of parameters of Hidden Markov Model in a distributed fashion (using PySpark).

Notifications You must be signed in to change notification settings

ReHoss/HMM_PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HMM_PySpark

Implementation of the transition matrix and emission matrix estimation (Viterbi algorithm) algorithm from the book: Data-Intensive Text Processing with MapReduce(Jimmy Lin and Chris Dyer). It is a map-reduce based approach. Two distinct implementations are provided: one only using python built-in packages and replicating the book pseudo-code and one using the NumPy libraby and some optimizations. See the report for a detailed description.

  • hmm_python.py: built in python only implementation (no extra package needed). It intents to replicate the map-reduce based implementation from the reference book.

  • hmm_numpy.py: numpy based, optimized implementation (recusrive forward-backward algorithm).

  • hmm_report.pdf: report explaining model and implementation. Also contains performance comparison and commentaries on the book point of view.

2020 - Hosseinkhan Boucher Rémy

About

Implementation of the Viterbi algorithm (EM) for the estimation of parameters of Hidden Markov Model in a distributed fashion (using PySpark).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages