-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: level for dominance edges might be not set #2
Labels
Milestone
Comments
This was referenced Aug 17, 2012
Closed
Closed
(by x-launchpaduser1153) |
thomaskrause
added a commit
that referenced
this issue
Aug 17, 2012
…eme. This scheme consists of a separate "annotation_pool" table, containing all possible combinations of node and edge annotations. The facts table only holds a bigint refeference to the id of annotation_table. In order to allow the PostgreSQL optimizer to know about the selectivity of a certain node/edge annotation query, the matching annotation ID is calculated by a immutable SQL function which result is calculated and inserted into the query before the optimizer runs. Selecting a huge number of annotation IDs (lemma=/.*/ on tiger2) did not have a significant impact on the query speed, since parsing the IDs is not necessary (their are included into the internal data structures). It is still possible to use the old pure scheme, which is important as a) a fallback b) for benchmarking These changes are made in order to improve the planners information about the selectivity of annotation based subqueries. Before the planner assumed statistical independence of the columns node/edge_annotation_name and node/edge_annotation_value (same for namespace, but normally people don't query explictly for it) which could be misleading in cases like "NN as value always means pos as name". Therefore a really simple index scheme is applied, which only indexes each column together with the corpus_ref or text_ref. This index could be improved but showed none or not huge disadvantages on the AQL test query set for tiger2. It also made Queries like count pos="ART" & pos="NN" & pos="VAPP" & #1 . #2 & #2 .1,30 #3 where all nodes except one are not really selective at all pass in less <60 seconds on tiger2. These queries result in a timeout when using the old index scheme. Also note that currently there are different SQL-getter functions for the possible combinations of namespace/name/value/regex annotation queries. This should be much improved, e.g. by using NULL and CASE in the SQL.
This was referenced Apr 2, 2013
This was referenced Jun 17, 2013
Closed
thomaskrause
added a commit
that referenced
this issue
Oct 9, 2013
Precedence optimization fails when applied to spans which cover more than one token Take e.g. this query on pcc2 NP & NP & NP & #1 . #2 & #2 . #3 In ANNIS 2 this gave us 2 results, but since ANNIS 3 incorrectly applies the precedence optimization the query gets translated to NP & NP & NP & #1 . #2 & #2 . #3 & #1 . #3 and has only 1 match. The correct optimization would be NP & NP & NP & #1 . #2 & #2 . #3 & #1 .* #3 This commit adds proper test cases for this situation and gives a fix
thomaskrause
added a commit
that referenced
this issue
Nov 21, 2013
like tok . pos="NN" instead of tok & pos="NN" & # 1 . #2
thomaskrause
added a commit
that referenced
this issue
Nov 21, 2013
This was referenced Nov 24, 2014
thomaskrause
added a commit
that referenced
this issue
Feb 15, 2016
… node definitions. This removes an ambiquity for the "!=" token. E.g. tok!="the" could be interpreted as "All token which don't have "the" as value" or as tok & "the" & #1 != #2 The latter one is semantically invalid (no binding) so the ambiquity is solved by not allowing the AQL operator "!=" and "==" in short AQL definitions. This fixes #494.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
On import first all "real" roots are compuated and the "root" column is set as a result. This "root" column is used as the starting point indicator for the level calculations. This behavior is wrong and inconsistent to previous releases. The right way of doing this is to use the real roots for the "root" column but the level should be set for all nodes that have no parent.
Queries affected:
This can only affect queries of the form >secedge m,n .
Since all primary dominance components have a real root, this affects only corpora with secondary edges. These edges normally do not form complex structures in their self so querying them make no sense.
Pointing relations are not affected since they do not have subcomponents and the real root is identical to the parent is null condition.
Imported from Launchpad using lp2gh.
The text was updated successfully, but these errors were encountered: