-
Notifications
You must be signed in to change notification settings - Fork 6
module__AttestedTermsProjector
#org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector
Projects a list of terms given in tree-tagger format.
This module is obsolete, superceded by org.bibliome.alvisnlp.modules.treetagger.TreeTaggerTermsProjector
org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector reads a list of terms from termsFile and searches for these terms in sections. The terms must be in tree-tagger format: each line contains a token/POS/lemma and each term is terminated by a period/SENT. The searched string for each term is the concatenation of token surface forms, or lemma if lemmaKeys is true, separated with a space character.
The parameters warnDuplicateValues, multipleValueAction, errorDuplicateValues and warnMultipleValues control who org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector reacts when encountering duplicate terms.
The parameters normalizeSpace, ignoreCase, ignoreDiacritics and ignoreWhitespace control the matching of entries on the sections.
The subject parameter specifies which text of the section should be matched. There are two options:
- the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
- the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.
org.bibliome.alvisnlp.modules.projectors.AttestedTermsProjector creates an annotation for each matched term and adds these annotations to the layer named targetLayerName. The created annotations will have the features termFeatureName, posFeatureName and lemmaFeatureName containing the concatenation of the corresponding term tokens surface form, POS tag and lemma respectively. In addition, the created annotations will have the feature keys and values defined in constantAnnotationFeatures.
Optional
Type: String
Name of the layer where to put match annotations.
Optional
Type: SourceStream
Attested terms file.
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Optional
Type: String
Name of the feature where to write the term form.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: false
Type: Boolean
Either to stop when a duplicate entry is seen.
Default value: false
Type: Boolean
Match ignoring case.
Default value: false
Type: Boolean
Match ignoring diacritics.
Default value: false
Type: Boolean
Match ignoring whitespace characters.
Default value: lemma
Type: String
Name of the feature where to write the term lemma.
Default value: true
Type: Boolean
Either to project lemmas instead of the forms.
Default value: add
Type: MultipleValueAction
Either to stop when multiple entries with the same key is seen.
Default value: false
Type: Boolean
Match normalizing whitespace.
Default value: pos
Type: String
Name of the feature where to write the term POS tags.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: org.bibliome.alvisnlp.modules.projectors.ContentsSubject@6c80d78a
Type: Subject
Subject on which to project the dictionary.