import: level for dominance edges might be not set #2

thomaskrause · 2012-08-17T09:20:00Z

On import first all "real" roots are compuated and the "root" column is set as a result. This "root" column is used as the starting point indicator for the level calculations. This behavior is wrong and inconsistent to previous releases. The right way of doing this is to use the real roots for the "root" column but the level should be set for all nodes that have no parent.

Queries affected:
This can only affect queries of the form >secedge m,n .
Since all primary dominance components have a real root, this affects only corpora with secondary edges. These edges normally do not form complex structures in their self so querying them make no sense.
Pointing relations are not affected since they do not have subcomponents and the real root is identical to the parent is null condition.

Imported from Launchpad using lp2gh.

date created: 2011-10-07T16:20:27Z
owner: krause
assignee: krause
the launchpad url was https://bugs.launchpad.net/bugs/870108

thomaskrause · 2012-08-17T09:20:20Z

(by x-launchpaduser1153)
Not sure I understand. The original code computes the level node with reference of the node's component. The component's root (that may or may not be a real root) is assigned a level 0 and so on. Does the patch restore this behavior or change it?

…eme. This scheme consists of a separate "annotation_pool" table, containing all possible combinations of node and edge annotations. The facts table only holds a bigint refeference to the id of annotation_table. In order to allow the PostgreSQL optimizer to know about the selectivity of a certain node/edge annotation query, the matching annotation ID is calculated by a immutable SQL function which result is calculated and inserted into the query before the optimizer runs. Selecting a huge number of annotation IDs (lemma=/.*/ on tiger2) did not have a significant impact on the query speed, since parsing the IDs is not necessary (their are included into the internal data structures). It is still possible to use the old pure scheme, which is important as a) a fallback b) for benchmarking These changes are made in order to improve the planners information about the selectivity of annotation based subqueries. Before the planner assumed statistical independence of the columns node/edge_annotation_name and node/edge_annotation_value (same for namespace, but normally people don't query explictly for it) which could be misleading in cases like "NN as value always means pos as name". Therefore a really simple index scheme is applied, which only indexes each column together with the corpus_ref or text_ref. This index could be improved but showed none or not huge disadvantages on the AQL test query set for tiger2. It also made Queries like count pos="ART" & pos="NN" & pos="VAPP" & #1 . #2 & #2 .1,30 #3 where all nodes except one are not really selective at all pass in less <60 seconds on tiger2. These queries result in a timeout when using the old index scheme. Also note that currently there are different SQL-getter functions for the possible combinations of namespace/name/value/regex annotation queries. This should be much improved, e.g. by using NULL and CASE in the SQL.

Precedence optimization fails when applied to spans which cover more than one token Take e.g. this query on pcc2 NP & NP & NP & #1 . #2 & #2 . #3 In ANNIS 2 this gave us 2 results, but since ANNIS 3 incorrectly applies the precedence optimization the query gets translated to NP & NP & NP & #1 . #2 & #2 . #3 & #1 . #3 and has only 1 match. The correct optimization would be NP & NP & NP & #1 . #2 & #2 . #3 & #1 .* #3 This commit adds proper test cases for this situation and gives a fix

… query like #1 . #2 & node & node

like tok . pos="NN" instead of tok & pos="NN" & # 1 . #2

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

Merge into amir-zeldes:develop

korpling changes 04/17/15

… node definitions. This removes an ambiquity for the "!=" token. E.g. tok!="the" could be interpreted as "All token which don't have "the" as value" or as tok & "the" & #1 != #2 The latter one is semantically invalid (no binding) so the ambiquity is solved by not allowing the AQL operator "!=" and "==" in short AQL definitions. This fixes #494.

This was referenced Aug 17, 2012

Search by "sentence" #6

Closed

WEKA: export metadata #9

Closed

Output corpus position in tokens for hits #11

Closed

gridtree seems to produce wrong output #14

Closed

Add island feature to grid #27

Closed

thomaskrause closed this as completed Aug 17, 2012

amir-zeldes mentioned this issue Nov 2, 2012

Orphan/root tokens are missing in the tiger tree view #47

Closed

This was referenced Apr 2, 2013

Grid is broken in parallel corpora #98

Closed

Corpus explorer does not output alignment edges with no annotations #99

Closed

amir-zeldes mentioned this issue May 15, 2013

Hit marking in HTML visualizations #105

Closed

This was referenced May 24, 2013

Highlighting of matched tokens within matched tokens in a second color doesn't always work #115

Closed

Segmentation precedence operator not working correctly #125

Closed

Hit marking in KWIC for segmentations precedence queries is incorrect #126

Closed

This was referenced Jun 17, 2013

Match highlighting in KWIC is incorrect/missing in parallel corpus query of non-terminal elements #137

Closed

Bug in arity operator #138

Closed

thomaskrause mentioned this issue Sep 26, 2013

Query for frequency does not output any data #220

Closed

amir-zeldes mentioned this issue Oct 23, 2013

Very thin bars in frequency analysis #245

Closed

thomaskrause added a commit that referenced this issue Nov 5, 2013

splitting up listener for query nodes and for joins in order to allow…

a5cebcd

… query like #1 . #2 & node & node

thomaskrause added a commit that referenced this issue Nov 21, 2013

allow to directly use node definitions in precedence operators

f8f655d

like tok . pos="NN" instead of tok & pos="NN" & # 1 . #2

thomaskrause added a commit that referenced this issue Nov 21, 2013

use "==" operator for identiy in order to solve ambiguity when parsing

3e897e0

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

amir-zeldes mentioned this issue Mar 13, 2014

New AQL operator: "near" #290

Closed

amir-zeldes pushed a commit that referenced this issue Sep 23, 2014

Merge pull request #2 from korpling/develop

1d3f2f6

Merge into amir-zeldes:develop

Annotation-123 mentioned this issue Nov 20, 2014

Nodes, Edges und SecEdges in Konstituentenstruktur (tree) sind leer oder werden nicht angezeigt #367

Closed

This was referenced Nov 24, 2014

Disjunction fails depending on order #372

Closed

Highlighting failure in disjunction #373

Closed

zangsir added a commit that referenced this issue Sep 11, 2015

Merge pull request #2 from korpling/develop

a08905f

korpling changes 04/17/15

thomaskrause mentioned this issue Nov 5, 2015

component normalization fails to generate unique variable name #458

Closed

TFeige mentioned this issue Nov 21, 2016

AQL-editor and bidirectional text #540

Closed

otichy mentioned this issue Mar 21, 2017

edge annotation in frequency analysis #554

Open

otichy mentioned this issue Mar 19, 2019

Find relations with no edge label #604

Closed

LisaEggert mentioned this issue Jan 27, 2021

Complex Search with "OR" #686

Closed

amir-zeldes mentioned this issue Aug 21, 2021

Operator negation in AQL - part 1: negation with existence assumption korpling/graphANNIS#186

Closed

lehmannx mentioned this issue Feb 22, 2023

CSV Export fails at large matches #816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import: level for dominance edges might be not set #2

import: level for dominance edges might be not set #2

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 17, 2012

import: level for dominance edges might be not set #2

import: level for dominance edges might be not set #2

Comments

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 17, 2012