Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4428 - Slow knowledge-base lookups on relation layers #4429

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
distribution: 'temurin'
cache: maven
- name: Build with Maven
run: mvn --no-transfer-progress -B clean package -T2 --file pom.xml
run: mvn --no-transfer-progress -B clean package --file pom.xml

# Fails with error message - no idea why...
# Optional: Uploads the full dependency graph to GitHub to improve the quality of Dependabot alerts this repository can receive
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -381,8 +381,7 @@ public List<KBHandle> disambiguate(KnowledgeBase aKB, String aConceptScope,
ConceptFeatureValueType aValueType, String aQuery, String aMention,
int aMentionBeginOffset, CAS aCas)
{
Set<KBHandle> candidates = generateCandidates(aKB, aConceptScope, aValueType, aQuery,
aMention);
var candidates = generateCandidates(aKB, aConceptScope, aValueType, aQuery, aMention);
return rankCandidates(aQuery, aMention, candidates, aCas, aMentionBeginOffset);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1622,7 +1622,10 @@ private String asRegexp(String aValue)
String value = aValue;
// Escape metacharacters
// value = value.replaceAll("[{}()\\[\\].+*?^$\\\\|]", "\\\\\\\\$0");
value = value.replaceAll("[{}()\\[\\].+*?^$\\\\|]+", ".+");
// Replace metacharacters with a match for any single char (.+ would be too slow)
value = value.replaceAll("[{}()\\[\\].+*?^$\\\\|]+", ".");
// Drop metacharacters
// value = value.replaceAll("[{}()\\[\\].+*?^$\\\\|]+", " ");
// Replace consecutive whitespace or control chars with a whitespace matcher
value = value.replaceAll("[\\p{Space}\\p{Cntrl}]+", "\\\\s+");
return value;
Expand Down Expand Up @@ -1980,8 +1983,13 @@ public List<KBHandle> asHandles(RepositoryConnection aConnection, boolean aAll)
results = evaluateListQuery(tupleQuery, aAll);
results.sort(comparing(KBObject::getUiLabel, CASE_INSENSITIVE_ORDER));

LOG.debug("[{}] Query returned {} results in {}ms", queryId, results.size(),
currentTimeMillis() - startTime);
long duration = currentTimeMillis() - startTime;
LOG.debug("[{}] Query returned {} results in {}ms {}", queryId, results.size(),
duration, duration > 1000 ? "-- SLOW QUERY!" : "");

if (duration > 1000 && !LOG.isTraceEnabled()) {
LOG.debug("[{}] Slow query: {}", queryId, queryString);
}

return results;
}
Expand Down Expand Up @@ -2026,8 +2034,13 @@ public boolean exists(RepositoryConnection aConnection, boolean aAll)
TupleQuery tupleQuery = aConnection.prepareTupleQuery(queryString);
boolean result = !evaluateListQuery(tupleQuery, aAll).isEmpty();

LOG.debug("[{}] Query returned {} in {}ms", queryId, result,
currentTimeMillis() - startTime);
long duration = currentTimeMillis() - startTime;
LOG.debug("[{}] Query returned {} in {}ms {}", queryId, result, duration,
duration > 1000 ? "-- SLOW QUERY!" : "");

if (duration > 1000 && !LOG.isTraceEnabled()) {
LOG.debug("[{}] Slow query: {}", queryId, queryString);
}

return result;
}
Expand Down Expand Up @@ -2060,8 +2073,14 @@ public Optional<KBHandle> asHandle(RepositoryConnection aConnection, boolean aAl
tupleQuery.setIncludeInferred(includeInferred);
result = evaluateListQuery(tupleQuery, aAll).stream().findFirst();

LOG.debug("[{}] Query returned a result in {}ms", queryId,
currentTimeMillis() - startTime);
long duration = currentTimeMillis() - startTime;
LOG.debug("[{}] Query returned a result in {}ms {}", queryId, duration,
duration > 1000 ? "-- SLOW QUERY!" : "");

if (duration > 1000 && !LOG.isTraceEnabled()) {
LOG.debug("[{}] Slow query: {}", queryId, queryString);
}

return result;
}
catch (QueryEvaluationException e) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@
import java.util.stream.Collectors;

import org.apache.commons.lang3.StringUtils;
import org.apache.uima.cas.CAS;
import org.apache.wicket.MarkupContainer;
import org.apache.wicket.core.request.handler.IPartialPageRequestHandler;
import org.apache.wicket.feedback.IFeedback;
Expand Down Expand Up @@ -131,7 +130,7 @@ protected List<KBHandle> getCandidates(IModel<AnnotatorState> aStateModel,
AnnotationFeature feat = getModelObject().feature;

var traits = readFeatureTraits(feat);
String repoId = traits.getRepositoryId();
var repoId = traits.getRepositoryId();
// Check if kb is actually enabled
if (!(repoId == null || kbService.isKnowledgeBaseEnabled(feat.getProject(), repoId))) {
return Collections.emptyList();
Expand All @@ -140,12 +139,23 @@ protected List<KBHandle> getCandidates(IModel<AnnotatorState> aStateModel,
// If there is a selection, we try obtaining its text from the CAS and use it as an
// additional item in the query. Note that there is not always a mention, e.g. when the
// feature is used in a document-level annotations.
CAS cas = aHandler != null ? aHandler.getEditorCas() : null;
String mention = aStateModel != null ? aStateModel.getObject().getSelection().getText()
: null;
int mentionBegin = aStateModel != null
? aStateModel.getObject().getSelection().getBegin()
: -1;
var cas = aHandler != null ? aHandler.getEditorCas() : null;

String mention = null;
int mentionBegin = -1;

if (aStateModel != null) {
var selection = aStateModel.getObject().getSelection();
if (selection.isSpan()) {
mention = selection.getText();
mentionBegin = selection.getBegin();
}

if (selection.isArc()) {
mention = selection.getOriginText() + " " + selection.getTargetText();
mentionBegin = selection.getBegin();
}
}

choices = clService.getLinkingInstancesInKBScope(traits.getRepositoryId(),
traits.getScope(), traits.getAllowedValueType(), finalInput, mention,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
<wicket:label key="scope"/>
</label>
<div class="col-sm-8">
<input wicket:id="scope" class="form-control w-100"/>
<input wicket:id="scope" class="w-100"/>
</div>
</div>
</form>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ The same is true for the object of a statement: After choosing the property for

4. *KB Resource*: This is provided as an option when the property has a range as a particular concept from the knowledge base. In this option, the user is provided with an auto-complete field with a list of knowledge base entities. This includes the subclass and instances of the range specified for the property.

[[sect_concept_features]]
=== Concept features

Concept features are features that allow referencing concepts in the knowledge base during annotation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,14 @@ user can choose one of the pre-configured mapping or provide a custom mapping.

=== Root Concepts

In the advanced settings, the user can leverage this feature of KB settings when one doesn't want the entire knowledge base to be used and rather choose to identify some specific root concepts. This feature specially helps in case of large knowledge bases such as Wikidata.
The knowledge base browser displays a class tree. By default, it tries to automatically determine the root classes of
this tree. However, for very large KBs this can be slow. Also you might not be interested in browsing the entire KB
but would rather focus on specific subtrees. In such cases, you can define the root concepts explicitly here.

NOTE: This setting currently affects **only class tree in the knowledge base browser**. You can still search for concepts
that are outside of the subtrees induced by the root concepts using the search field on the knowledge-base page and you
can also still link concept features. to concepts outside the subtrees. In order to limit a concept feature to a particular
subtree, use the **Scope** setting in the <<sect_concept_features,concept feature settings>>.


=== Additional Matching Properties
Expand Down