Skip to content

Latest commit

 

History

History
32 lines (27 loc) · 1.23 KB

README.md

File metadata and controls

32 lines (27 loc) · 1.23 KB

bestLM

Run SRILM with different options to find the best language model(LM) given the training and test data.

Directory Structure: src/ Scripts run/ Experiment Folder /demo Sample experiment

How to run?

  • Set SRILM path in run/Makefile
  • Copy your LM training file and name it as train.tok.gz
  • Copy your test file and name it as test.tok.gz (this file is used to calculate perplexity)
  • Edit the vocabulary threshold according to your needs
  • run cd run/demo && make ppl.out
  • To run on 10 CPU run cd run/demo && make ppl.out NCPU=10
  • Each line of ppl.out will corresponds to an LM and its perplexity ppl.out format: training-set test-set vocabulary-thr ngram discount option prob ppl ppl2 time
  • Each LM has its own folder in the experiment folder (i.e., run/demo/).
  • In order to run experiments on different datasets create a folder under run and copy the Makefile in demo to this new directory and perform the above steps

Supported SRILM options:

  • An experiment will run the following discountings: ndiscount wbdiscount ukndiscount kndiscount
  • All discountings will be run with/without interpolation
  • All experiments will run for 4 and 5 grams
  • If you want to edit these settings please edit the src/lm-args.pl