Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4158 - Exception when annotating something after a longer pause #4204

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ config = [
agentLabel: '',
maven: 'Maven 3',
jdk: 'Zulu 11',
extraMavenArguments: '',
extraMavenArguments: '-Ddkpro.core.testCachePath="${WORKSPACE}/cache/dkpro-core-datasets" -T 4',
wipeWorkspaceBeforeBuild: true,
wipeWorkspaceAfterBuild: true
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import java.util.Optional;

import org.apache.uima.cas.CAS;
import org.apache.uima.cas.FSIterator;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.Type;
import org.apache.uima.cas.TypeSystem;
Expand Down Expand Up @@ -92,9 +93,40 @@ protected boolean typeSystemInit(TypeSystem aTypeSystem)
@Override
public List<AnnotationFS> selectAnnotationsInWindow(CAS aCas, int aWindowBegin, int aWindowEnd)
{
return aCas.select(type).coveredBy(0, aWindowEnd).includeAnnotationsWithEndBeyondBounds()
.map(fs -> (AnnotationFS) fs)
.filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd))
// https://github.com/apache/uima-uimaj/issues/345
// return aCas.select(type).coveredBy(0, aWindowEnd).includeAnnotationsWithEndBeyondBounds()
// .map(fs -> (AnnotationFS) fs)
// .filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd))
// .collect(toList());

List<AnnotationFS> list = new ArrayList<AnnotationFS>();

// withSnapshotIterators() not needed here since we copy the FSes to a list anyway
FSIterator<AnnotationFS> it = aCas.getAnnotationIndex(type).iterator();

// Skip annotations whose start is before the start parameter.
while (it.isValid() && (it.get()).getBegin() < aWindowBegin) {
it.moveToNext();
}

boolean strict = false;
while (it.isValid()) {
AnnotationFS a = it.get();
// If the start of the current annotation is past the end parameter, we're done.
if (a.getBegin() > aWindowEnd) {
break;
}
it.moveToNext();
if (strict && a.getEnd() > aWindowEnd) {
continue;
}

list.add(a);
}

return list.stream() //
.map(fs -> (AnnotationFS) fs) //
.filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd)) //
.collect(toList());
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,17 @@
import static java.util.Arrays.asList;
import static java.util.stream.Collectors.toList;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import org.apache.uima.cas.CAS;
import org.apache.uima.cas.FSIterator;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.text.AnnotationFS;
import org.apache.uima.cas.text.AnnotationPredicates;
import org.apache.uima.fit.util.CasUtil;
import org.apache.uima.fit.util.FSUtil;

import de.tudarmstadt.ukp.clarin.webanno.api.annotation.util.WebAnnoCasUtil;
Expand Down Expand Up @@ -71,9 +74,41 @@ public SpanDiffAdapter(String aType, Set<String> aLabelFeatures)
@Override
public List<AnnotationFS> selectAnnotationsInWindow(CAS aCas, int aWindowBegin, int aWindowEnd)
{
return aCas.select(getType()).coveredBy(0, aWindowEnd)
.includeAnnotationsWithEndBeyondBounds().map(fs -> (AnnotationFS) fs)
.filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd))
// https://github.com/apache/uima-uimaj/issues/345
// return aCas.select(type).coveredBy(0, aWindowEnd).includeAnnotationsWithEndBeyondBounds()
// .map(fs -> (AnnotationFS) fs)
// .filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd))
// .collect(toList());

List<AnnotationFS> list = new ArrayList<AnnotationFS>();

// withSnapshotIterators() not needed here since we copy the FSes to a list anyway
FSIterator<AnnotationFS> it = aCas.getAnnotationIndex(CasUtil.getType(aCas, getType()))
.iterator();

// Skip annotations whose start is before the start parameter.
while (it.isValid() && (it.get()).getBegin() < aWindowBegin) {
it.moveToNext();
}

boolean strict = false;
while (it.isValid()) {
AnnotationFS a = it.get();
// If the start of the current annotation is past the end parameter, we're done.
if (a.getBegin() > aWindowEnd) {
break;
}
it.moveToNext();
if (strict && a.getEnd() > aWindowEnd) {
continue;
}

list.add(a);
}

return list.stream() //
.map(fs -> (AnnotationFS) fs) //
.filter(ann -> AnnotationPredicates.overlapping(ann, aWindowBegin, aWindowEnd)) //
.collect(toList());
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.apache.uima.cas.CAS;
import org.apache.uima.cas.Type;
import org.apache.uima.cas.text.AnnotationFS;
import org.apache.uima.fit.util.CasUtil;
import org.springframework.util.CollectionUtils;

import de.tudarmstadt.ukp.clarin.webanno.model.AnnotationLayer;
Expand Down Expand Up @@ -76,10 +77,19 @@ public boolean check(Project aProject, CAS aCas, List<LogMessage> aMessages)
}

for (AnnotationFS ann : select(aCas, type)) {
var startsOutside = aCas.select(Sentence._TypeName)
.covering(ann.getBegin(), ann.getBegin()).isEmpty();
var endsOutside = aCas.select(Sentence._TypeName)
.covering(ann.getEnd(), ann.getEnd()).isEmpty();
// https://github.com/apache/uima-uimaj/issues/345
// var startsOutside = aCas.select(Sentence._TypeName)
// .covering(ann.getBegin(), ann.getBegin()).isEmpty();
var startsOutside = CasUtil
.selectCovering(ann.getCAS(), CasUtil.getType(ann.getCAS(), Sentence.class),
ann.getBegin(), ann.getBegin())
.isEmpty();
// https://github.com/apache/uima-uimaj/issues/345
// var endsOutside = aCas.select(Sentence._TypeName)
// .covering(ann.getEnd(), ann.getEnd()).isEmpty();
var endsOutside = CasUtil.selectCovering(ann.getCAS(),
CasUtil.getType(ann.getCAS(), Sentence.class), ann.getEnd(), ann.getEnd())
.isEmpty();

if (!startsOutside && !endsOutside) {
continue;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
* Licensed to the Technische Universität Darmstadt under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The Technische Universität Darmstadt
* licenses this file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package de.tudarmstadt.ukp.clarin.webanno.diag.checks;

import static de.tudarmstadt.ukp.clarin.webanno.api.annotation.util.WebAnnoCasUtil.getRealCas;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static org.apache.uima.cas.impl.Serialization.deserializeCASComplete;
import static org.apache.uima.cas.impl.Serialization.serializeCASComplete;

import java.util.List;
import java.util.Map;

import org.apache.uima.cas.CAS;
import org.apache.uima.cas.impl.CASImpl;
import org.apache.uima.resource.ResourceInitializationException;

import de.tudarmstadt.ukp.clarin.webanno.api.annotation.util.WebAnnoCasUtil;
import de.tudarmstadt.ukp.clarin.webanno.model.Project;
import de.tudarmstadt.ukp.clarin.webanno.support.logging.LogMessage;

public class UnreachableAnnotationsCheck
implements Check
{
@Override
public boolean check(Project aProject, CAS aCas, List<LogMessage> aMessages)
{
var casImpl = (CASImpl) getRealCas(aCas);

var annotationCountsBefore = countFeatureStructures(casImpl);

// Disable forced retaining of all assigned annotations so that during serialization,
// any temporary annotations that got potentially stuck in the CAS can be released.
var dummy = makeDummyCas();
try (var ctx = casImpl.ll_enableV2IdRefs(false);
var ctx1 = dummy.ll_enableV2IdRefs(false)) {
var data = serializeCASComplete(casImpl);
deserializeCASComplete(data, dummy);
}

var annotationCountsAfter = countFeatureStructures(dummy);

var diffTypes = 0;
var totalDiff = 0;
for (var typeName : annotationCountsBefore.keySet().stream().sorted()
.toArray(String[]::new)) {
var before = annotationCountsBefore.getOrDefault(typeName, 0l);
var after = annotationCountsAfter.getOrDefault(typeName, 0l);
var diff = before - after;
totalDiff += diff;
if (diff > 0) {
diffTypes++;
aMessages.add(LogMessage.info(this, "Type [%s] has [%d] unreachable instances",
typeName, diff));
}
}

if (totalDiff > 0) {
if (diffTypes > 1) {
aMessages.add(LogMessage.info(this,
"A total of [%d] unreachable instances that were found", totalDiff));
}
}

return true;
}

public static CASImpl makeDummyCas()
{
try {
return (CASImpl) WebAnnoCasUtil.getRealCas(WebAnnoCasUtil.createCas());
}
catch (ResourceInitializationException e) {
throw new IllegalStateException(e);
}
}

public static Map<String, Long> countFeatureStructures(CASImpl casImpl)
{
return WebAnnoCasUtil.findAllFeatureStructures(casImpl).stream() //
.map(fs -> fs.getType().getName()) //
.collect(groupingBy(identity(), counting()));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
import de.tudarmstadt.ukp.clarin.webanno.diag.checks.RelationOffsetsCheck;
import de.tudarmstadt.ukp.clarin.webanno.diag.checks.TokensAndSententencedDoNotOverlapCheck;
import de.tudarmstadt.ukp.clarin.webanno.diag.checks.UniqueDocumentAnnotationCheck;
import de.tudarmstadt.ukp.clarin.webanno.diag.checks.UnreachableAnnotationsCheck;
import de.tudarmstadt.ukp.clarin.webanno.diag.repairs.CoverAllTextInSentencesRepair;
import de.tudarmstadt.ukp.clarin.webanno.diag.repairs.ReattachFeatureAttachedSpanAnnotationsAndDeleteExtrasRepair;
import de.tudarmstadt.ukp.clarin.webanno.diag.repairs.ReattachFeatureAttachedSpanAnnotationsRepair;
Expand Down Expand Up @@ -229,4 +230,10 @@ public TokensAndSententencedDoNotOverlapCheck tokensAndSententencedDoNotOverlapC
{
return new TokensAndSententencedDoNotOverlapCheck();
}

@Bean
public UnreachableAnnotationsCheck unreachableAnnotationsCheck()
{
return new UnreachableAnnotationsCheck();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@
*/
package de.tudarmstadt.ukp.clarin.webanno.diag.repairs;

import static de.tudarmstadt.ukp.clarin.webanno.api.annotation.util.WebAnnoCasUtil.getRealCas;
import static de.tudarmstadt.ukp.clarin.webanno.diag.checks.UnreachableAnnotationsCheck.countFeatureStructures;

import java.io.IOException;
import java.util.List;

import org.apache.uima.UIMAException;
import org.apache.uima.cas.CAS;
import org.apache.uima.cas.impl.CASImpl;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down Expand Up @@ -51,12 +55,46 @@ public UpgradeCasRepair(AnnotationSchemaService aAnnotationService)
public void repair(Project aProject, CAS aCas, List<LogMessage> aMessages)
{
try {
var casImpl = (CASImpl) getRealCas(aCas);

var annotationCountsBefore = countFeatureStructures(casImpl);

annotationService.upgradeCas(aCas, aProject);
aMessages.add(LogMessage.info(this, "CAS upgraded."));

var annotationCountsAfter = countFeatureStructures(casImpl);

var diffTypes = 0;
var totalDiff = 0;
var totalBefore = 0;
var totalAfter = 0;
for (var typeName : annotationCountsBefore.keySet().stream().sorted()
.toArray(String[]::new)) {
var before = annotationCountsBefore.getOrDefault(typeName, 0l);
var after = annotationCountsAfter.getOrDefault(typeName, 0l);
var diff = before - after;
totalDiff += diff;
totalBefore += before;
totalAfter += after;
if (diff > 0) {
diffTypes++;
aMessages.add(LogMessage.info(this,
"Type [%s] had [%d] unreachable instances that were removed (before: [%d], after: [%d])",
typeName, diff, before, after));
}
}

if (totalDiff > 0) {
if (diffTypes > 1) {
aMessages.add(LogMessage.info(this,
"A total of [%d] unreachable instances that were removed (before: [%d], after: [%d])",
totalDiff, totalBefore, totalAfter));
}
}
}
catch (UIMAException | IOException e) {
log.error("Unabled to access CAS", e);
aMessages.add(LogMessage.error(this, "Unabled to access CAS", e.getMessage()));
aMessages.add(LogMessage.error(this, "Unabled to access CAS: %s", e.getMessage()));
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -173,9 +173,10 @@ dependent.
ID:: `CASMetadataTypeIsPresentCheck`
Related repairs:: <<repair_UpgradeCasRepair>>

Checks if the ìnternal type `CASMetadata is defined in the type system of this CAS. If this is
Checks if the internal type `CASMetadata` is defined in the type system of this CAS. If this is
not the case, then the application may not be able to detect concurrent modifications.


[[check_DanglingRelationsCheck]]
=== Dangling relations
[horizontal]
Expand Down Expand Up @@ -210,6 +211,16 @@ TSV or CoNLL formats will not include any text and annotations of parts of the d
not covered by sentences or may produce errors during export.


[[check_UnreachableAnnotationsCheck]]
=== Unreachable annotations check
[horizontal]
ID:: `UnreachableAnnotationsCheck`
Related repairs:: <<repair_UpgradeCasRepair>>

Checks if there are any unreachable feature structures. Such feature structures take up memory, but
they are not regularly accessible. Such feature structures may be created as a result of bugs.
Removing them is harmless and reduces memory and disk space usage.


[[sect_repairs]]
== Repairs
Expand Down Expand Up @@ -332,6 +343,9 @@ ID:: `UpgradeCasRepair`
Ensures that the CAS is up-to-date with the project type system. It performs the same operation
which is regularly performed when a user opens a document for annotation/curation.

This repair also removes any unreachable feature structures. Such feature structures may be created as a result of bugs.
Removing them is harmless and reduces memory and disk space usage.

This is considered to be safe repair action as it only garbage-collects data from the CAS that is
no longer reachable anyway.

Expand All @@ -354,4 +368,4 @@ ID:: `CoverAllTextInSentencesRepair`

This repair checks if there is any text not covered by sentences. If there is, it creates a new
sentence annotation on this text starting at the end of the last sentence before it (or the start
of the document text) and the begin of the next sentence (or the end of the document text).
of the document text) and the begin of the next sentence (or the end of the document text).
Loading