Skip to content

ItsdbProfile

StephanOepen edited this page Aug 29, 2011 · 6 revisions

This page is a brief introduction to the profiles used by itsdb (see ItsdbTop). For more information, see the [http://www.delph-in.net/itsdb/publications/manual.pdf User & Reference Manual].

TableOfContents

Overview

A Competence and Performance Profile is a test suite instance that includes competence and performance information from a processing system (such as a parser or generator) and a grammar (and possibly pre-processor).

A profile typically is made up of a directory consisting of a database stored as several text files. The format of the files is controlled by the relations file. The most up-to-date relations file should be here: [http://svn.emmtee.net/trunk/lingo/lkb/src/tsdb/skeletons/english/Relations]. Within each file, the rows are lines, and the columns are separated by the @ mark.

Profiles can be manipulated within itsdb proper (the default) and some other [wiki:ToolsTop tools].

Some of the files are described below. Note, however, that the itsdb database schema is subject to occassional revision; the [http://svn.emmtee.net/trunk/lingo/lkb/src/tsdb/skeletons/english/Relations canonical current version] is available through SVN.

item

Schema

item:
  i-id :integer :key
  i-origin :string
  i-register :string
  i-format :string
  i-difficulty :integer
  i-category :string
  i-input :string
  i-tokens :string
  i-gloss :string
  i-translation :string
  i-wf :integer
  i-length :integer
  i-comment :string
  i-author :string
  i-date :date

This is the input file. The actual input is typically column 7: i-input. If it is tokenized/pre-processed before processing, the results of this should go into field 8: i-tokens.

run

This gives some information about the configuration of the actual test-run, including the processing engine and grammar versions.

run:
  run-id :integer :key                  # unique test run identifier
  run-comment :string                   # descriptive narrative
  platform :string                      # implementation platform (version)
  tsdb :string                          # tsdb(1) (version) used
  application :string                   # application (version) used
  environment :string                   # application-specific information
  grammar :string                       # grammar (version) used
  avms :integer                         # number of avm types in image
  sorts :integer                        # number of sort types in image
  templates :integer                    # number of templates in image
  lexicon :integer                      # number of lexical entries
  lrules :integer                       # number of lexical rules
  rules :integer                        # number of (non-lexical) rules
  user :string                          # user who did the test run
  host :string                          # machine used for this run
  os :string                            # operating system (version)
  start :date                           # start time of this test run
  end :date                             # end time for this test run
  items :integer                        # number of test items in this run
  status :string                        # exit status (PVM only)

parse

This gives information about the entire process for a single input item: this information is mainly used for competence/performance profiling. The i-id should be linked to the id in item.

parse:
  parse-id :integer :key                # unique parse identifier
  run-id :integer :key                  # test run for this parse
  i-id :integer :key                    # item parsed
  p-input :string                       # initial (pre-processed) parser input
  p-tokens :string                      # internal parser input: lexical lookup
  readings :integer                     # number of readings obtained
  first :integer                        # time to find first reading (msec)
  total :integer                        # total time for parsing (msec)
  tcpu :integer                         # total (cpu) processing time (msec)
  tgc :integer                          # gc time used (msec)
  treal :integer                        # overall real time (msec)
  words :integer                        # lexical entries retrieved
  l-stasks :integer                     # successful lexical rule applications
  p-ctasks :integer                     # parser contemplated tasks (LKB)
  p-ftasks :integer                     # parser filtered tasks
  p-etasks :integer                     # parser executed tasks
  p-stasks :integer                     # parser succeeding tasks
  aedges :integer                       # active items in chart (PAGE)
  pedges :integer                       # passive items in chart
  raedges :integer                      # active items contributing to result
  rpedges :integer                      # passive items contributing to result
  unifications :integer                 # number of (node) unifications
  copies :integer                       # number of (node) copy operations
  conses :integer                       # cons() cells allocated
  symbols :integer                      # symbols allocated
  others :integer                       # bytes of memory allocated
  gcs :integer                          # number of garbage collections
  i-load :integer                       # initial load (start of parse)
  a-load :integer                       # average load
  date :date                            # date and time of parse
  error :string                         # error string (if applicable |:-)
  comment :string                       # application-specific comment

result

This stores the actual result of the processing. There may be multiple results for a single input. The i-id should be linked to the id in item, the parse-id to the is in parse. Typically this file will be used as input by any post-processors, such as exporting, transfer, generation and so on.

result:
  parse-id :integer :key                # parse for this result
  result-id :integer                    # unique result identifier
  time :integer                         # time to find this result (msec)
  r-ctasks :integer                     # parser contemplated tasks
  r-ftasks :integer                     # parser filtered tasks
  r-etasks :integer                     # parser executed tasks
  r-stasks :integer                     # parser succeeding tasks
  size :integer                         # size of feature structure
  r-aedges :integer                     # active items for this result
  r-pedges :integer                     # passive items in this result
  derivation :string                    # derivation tree for this reading
  surface :string                       # surface string (e.g. realization)
  tree :string                          # phrase structure tree (CSLI labels)
  mrs :string                           # mrs for this reading
  flags :string                         # arbitrary annotation (e.g. BLEU)
Clone this wiki locally