Skip to content

ErgReleases

StephanOepen edited this page Nov 26, 2011 · 29 revisions

Background

This page (a work in progress, like so many others on this wiki) collects some practical and historic information around official snapshots of the ERG, e.g. officially released versions of the grammar.

Re-Generate the Core SEM-I

The bulk of the semantic interface (SEM-I) is auto-generated from the lexicon (recorded as core.smi):

  (with-open-file (stream "~/src/logon/lingo/terg/core.smi"
                   :direction :output :if-exists :supersede)
    (mt::print-semi (mt:construct-semi) :format :compact :stream stream))

The master file erg.smi is manually maintained and includes the auto-generated entries.

Validate and Update the Head Table

  (tsdb::read-heads "~/src/logon/lingo/terg/erg.hds" :test t)

Generate Maximum Entropy and PCFG Models

By default, the Maximum Entropy training scripts (re-)generate a fresh feature cache, hence the following two jobs must not run in parallel

  sbatch ${LOGONROOT}/uio/titan/redwoods \
    --redwoods --run train.wescience.lisp

  sbatch ${LOGONROOT}/uio/titan/redwoods \
    --redwoods --run train.redwoods.lisp

For the time being, there is only one PCFG model, trained off the full (non-testing) Redwoods collection:

  sbatch ${LOGONROOT}/uio/titan/redwoods \
    --redwoods --run pcfg.lisp

Update Summary Statistics of Redwoods Treebanks

Populate the Lexical Type Database

Clone this wiki locally