Skip to content

module__AttestedTermsProjector

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector

Synopsis

Projects a list of terms given in tree-tagger format.

This module is obsolete, superceded by org.bibliome.alvisnlp.modules.treetagger.TreeTaggerTermsProjector

Description

org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector reads a list of terms from termsFile and searches for these terms in sections. The terms must be in tree-tagger format: each line contains a token/POS/lemma and each term is terminated by a period/SENT. The searched string for each term is the concatenation of token surface forms, or lemma if lemmaKeys is true, separated with a space character.

The parameters warnDuplicateValues, multipleValueAction, errorDuplicateValues and warnMultipleValues control who org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector reacts when encountering duplicate terms.

The parameters normalizeSpace, ignoreCase, ignoreDiacritics and ignoreWhitespace control the matching of entries on the sections.

The subject parameter specifies which text of the section should be matched. There are two options:

  • the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
  • the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.

org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector creates an annotation for each matched term and adds these annotations to the layer named targetLayerName. The created annotations will have the features termFeatureName, posFeatureName and lemmaFeatureName containing the concatenation of the corresponding term tokens surface form, POS tag and lemma respectively. In addition, the created annotations will have the feature keys and values defined in constantAnnotationFeatures.

Parameters

Optional

Type: String

Name of the layer where to put match annotations.

Optional

Type: SourceStream

Attested terms file.

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Optional

Type: String

Name of the feature where to write the term form.

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: false

Type: Boolean

Either to stop when a duplicate entry is seen.

Default value: false

Type: Boolean

Match ignoring case.

Default value: false

Type: Boolean

Match ignoring diacritics.

Default value: false

Type: Boolean

Match ignoring whitespace characters.

Default value: lemma

Type: String

Name of the feature where to write the term lemma.

Default value: true

Type: Boolean

Either to project lemmas instead of the forms.

Default value: add

Type: MultipleValueAction

Either to stop when multiple entries with the same key is seen.

Default value: false

Type: Boolean

Match normalizing whitespace.

Default value: pos

Type: String

Name of the feature where to write the term POS tags.

Default value: true

Type: Expression

Process only sections that satisfy this filter.

Default value: org.bibliome.alvisnlp.modules.projectors.ContentsSubject@6c80d78a

Type: Subject

Subject on which to project the dictionary.

Clone this wiki locally