-
Notifications
You must be signed in to change notification settings - Fork 67
Evaluation Results
Jannik Strötgen edited this page Sep 20, 2016
·
12 revisions
-
Introduction
- ACE Tern 2004 Training Corpus
- AncientTimes Arabic
- AncientTimes German
- AncientTimes English
- AncientTimes Spanish
- AncientTimes French
- AncientTimes Italian
- AncientTimes Dutch
- AncientTimes Vietnamese
- ACE Tern 2005 Corpus
- Arabic test-150 Corpus
- Arabic test-50 Corpus
- Arabic test-50-star Corpus
- Arabic test-50-star Corpus evaluated with TE3-Tools
- I-CAB Test Corpus
- TempEval2 Evaluation Corpus
- TempEval2 Spanish Evaluation Corpus
- TempEval2 Italian Evaluation Corpus
- TempEval 2 Italian Training Corpus evaluated with TE3-Tools
- TempEval 2 Italian Test Corpus evaluated with TE3-Tools
- TempEval 2 Chinese Original Training Corpora
- TempEval 2 Chinese CLEAN Training Corpora
- TempEval 2 Chinese IMPROVED Training Corpora
- TempEval 2 Chinese Original Evaluation Corpora
- TempEval 2 Chinese CLEAN Evaluation Corpora
- TempEval 2 Chinese IMPROVED Evaluation Corpora
- TimeBank 1.2 Corpus
- WikiWars Corpus
- WikiWarsDE Corpus
- WikiWarsVN Corpus
- WikiWarsVN Corpus evaluated with TE3-Tools
- WikiWarsHR Corpus evaluated with TE3-Tools
- Time4SCI Corpus
- Time4SMS Corpus
- TempEval 3 AQUAINT Training Corpus
- TempEval 3 TimeBank Training Corpus
- TempEval 3 trainT3 Spanish Training Corpus
- TempEval 3 Platinum English Evaluation Corpus
- TempEval 3 Spanish Evaluation Corpus
- French TimeBank 1.1 Corpus
- EVALITA 2014 Test Corpus
- Portuguese TimeBank 1.0 Corpus (test subset)
This page contains the evaluation results of version 2.2 of HeidelTime.
Operating system: Debian Linux
Java version: 1.8.0_101
Locale: en_GB (unless given in the workflow description in ReproduceEvaluationResults)
Tokenization and POS-Tagging: TreeTaggerWrapper, JVnTextProWrapper (Vietnamese corpora: JVnTextPro 2.0, Maxent model), StanfordPOSTaggerWrapper (Arabic corpora: Stanford POS Tagger 3.3.1, arabic.tagger model), HunPosTaggerWrapper (Croatian WikiWarsHR: HunPos 1.0, Croatian model from 09.05.2013)
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 95.8% | 79.0% | 86.6% |
Extraction (strict) | 87.3% | 72.0% | 78.9% |
Normalization (value) | 86.8% | 87.3% | 87.1% |
Extraction & Normalization (lenient + VAL) | 83.1% | 68.6% | 75.1% |
Extraction & Normalization (strict + VAL) | 78.2% | 64.6% | 70.7% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 83.33% | 74.26% | 78.53% |
Extraction (relaxed) | 93.33% | 83.17% | 87.96% |
- Attribute value F1: 83.77%
- Attribute type F1: 87.96%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 86.75% | 71.98% | 78.68% |
Extraction (relaxed) | 95.36% | 79.12% | 86.49% |
- Attribute value F1: 81.08%
- Attribute type F1: 85.89%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 88.85% | 78.88% | 83.57% |
Extraction (relaxed) | 97.03% | 86.14% | 91.26% |
- Attribute value F1: 84.97%
- Attribute type F1: 90.56%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 80.85% | 72.04% | 76.19% |
Extraction (relaxed) | 96.28% | 85.78% | 90.73% |
- Attribute value F1: 85.71%
- Attribute type F1: 88.22%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 89.07% | 77.19% | 82.71% |
Extraction (relaxed) | 98.38% | 85.26% | 91.35% |
- Attribute value F1: 90.23%
- Attribute type F1: 91.35%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 79.63% | 75.11% | 77.3% |
Extraction (relaxed) | 91.2% | 86.03% | 88.54% |
- Attribute value F1: 79.55%
- Attribute type F1: 85.84%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 81.67% | 78.4% | 80.0% |
Extraction (relaxed) | 94.17% | 90.4% | 92.24% |
- Attribute value F1: 88.16%
- Attribute type F1: 88.16%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 87.27% | 82.76% | 84.96% |
Extraction (relaxed) | 97.27% | 92.24% | 94.69% |
- Attribute value F1: 92.04%
- Attribute type F1: 93.81%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 89.3% | 75.5% | 81.8% |
Extraction (strict) | 77.3% | 65.3% | 70.8% |
Normalization (value) | 74.8% | 77.3% | 76% |
Extraction & Normalization (lenient + VAL) | 66.8% | 56.4% | 61.2% |
Extraction & Normalization (strict + VAL) | 62.8% | 53.1% | 57.5% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 80.1% | 90.9% | 85.2% |
Extraction (strict) | 64.9% | 73.7% | 69.0% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 79.7% | 90.4% | 84.7% |
Extraction (strict) | 62.8% | 71.3% | 66.8% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 91.9% | 91.3% | 91.6% |
Extraction (strict) | 84.8% | 84.2% | 84.5% |
Normalization (value) | 91.9% | 91.9% | 91.9% |
Extraction & Normalization (lenient + VAL) | 84.5% | 83.9% | 84.2% |
Extraction & Normalization (strict + VAL) | 80.1% | 79.5% | 79.8% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 80.99% | 80.99% | 80.99% |
Extraction (relaxed) | 90.91% | 90.91% | 90.91% |
- Attribute value F1: 82.23%
- Attribute type F1: 84.3%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 92.7% | 81.5% | 86.8% |
Extraction (strict) | 64.1% | 56.4% | 60.0% |
Normalization (value) | 75.6% | 78.3% | 76.9% |
Extraction & Normalization (lenient + VAL) | 70.1% | 61.7% | 65.6% |
Extraction & Normalization (strict + VAL) | 51.4% | 45.2% | 48.1% |
Precision | Recall | F-Score |
---|---|---|
88.0% | 86.0% | 87.0% |
- Attribute type: 96.0 %
- Attribute value: 86.0 %
The Spanish TempEval2 Evaluation Corpus is essentially the same as TempEval 3 version further down in this document, but with some improvements, so please refer to that as it also uses our preferred evaluation method.
Precision | Recall | F-Score |
---|---|---|
93.1% | 89.6% | 91.3% |
- Attribute type: 98.0 %
- Attribute value: 94.0 %
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 73.3% | 88.72% | 80.28% |
Extraction (relaxed) | 77.41% | 93.69% | 84.78% |
- Attribute value F1: 76.47%
- Attribute type F1: 82.18%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 77.93% | 89.68% | 83.39% |
Extraction (relaxed) | 83.45% | 96.03% | 89.3% |
- Attribute value F1: 81.18%
- Attribute type F1: 85.61%
Precision | Recall | F-Score |
---|---|---|
96.0% | 93.9% | 94.9% |
- Attribute type: 92.0 %
- Attribute value: 79.0 %
Precision | Recall | F-Score |
---|---|---|
80.1% | 95.7% | 87.2% |
- Attribute type: 94.0 %
- Attribute value: 90.0 %
Precision | Recall | F-Score |
---|---|---|
97.4% | 95.6% | 96.5% |
- Attribute type: 94.0 %
- Attribute value: 91.0 %
Precision | Recall | F-Score |
---|---|---|
93.8% | 87.5% | 90.5% |
- Attribute type: 93.0 %
- Attribute value: 70.0 %
Precision | Recall | F-Score |
---|---|---|
62.4% | 91.8% | 74.3% |
- Attribute type: 96.0 %
- Attribute value: 89.0 %
Precision | Recall | F-Score |
---|---|---|
95.8% | 89.3% | 92.4% |
- Attribute type: 96.0 %
- Attribute value: 86.0 %
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 92.6% | 91.5% | 92.0% |
Extraction (strict) | 86.6% | 85.6% | 86.1% |
Normalization (value) | 87.6% | 87.6% | 87.6% |
Extraction & Normalization (lenient + VAL) | 81.0% | 80.1% | 80.6% |
Extraction & Normalization (strict + VAL) | 77.0% | 76.2% | 76.6% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 98.3% | 86.1% | 91.8% |
Extraction (strict) | 93.3% | 81.8% | 87.2% |
Normalization (value) | 90.5% | 91.1% | 90.8% |
Extraction & Normalization (lenient + VAL) | 89.0% | 78.0% | 83.1% |
Extraction & Normalization (strict + VAL) | 85.9% | 75.3% | 80.2% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 98.7% | 89.3% | 93.8% |
Extraction (strict) | 92.6% | 83.8% | 88.0% |
Normalization (value) | 88.5% | 88.5% | 88.5% |
Extraction & Normalization (lenient + VAL) | 87.4% | 79.1% | 83.0% |
Extraction & Normalization (strict + VAL) | 83.2% | 75.3% | 79.1% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 92.1% | 97.8% | 94.8% |
Extraction (strict) | 72.9% | 77.4% | 75.1% |
Normalization (value) | 95% | 95% | 95% |
Extraction & Normalization (lenient + VAL) | 87.5% | 92.9% | 90.1% |
Extraction & Normalization (strict + VAL) | 69.2% | 73.5% | 71.2% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 94.09% | 94.09% | 94.09% |
Extraction (relaxed) | 98.18% | 98.18% | 98.18% |
- Attribute value F1: 91.36%
- Attribute type F1: 93.64%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 88.93% | 86.86% | 87.88% |
Extraction (relaxed) | 92.62% | 90.46% | 91.53% |
- Attribute value F1: 80.8%
- Attribute type F1: 89.74%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 96.2% | 70.6% | 81.4% |
Extraction (strict) | 88.9% | 65.3% | 75.3% |
Normalization (value) | 88.9% | 88.9% | 88.9% |
Extraction & Normalization (lenient + VAL) | 85.5% | 62.8% | 72.4% |
Extraction & Normalization (strict + VAL) | 80.0% | 58.8% | 67.7% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (lenient) | 99.4% | 91.3% | 95.2% |
Extraction (strict) | 98.2% | 90.2% | 94.1% |
Normalization (value) | 97.1% | 97.1% | 97.1% |
Extraction & Normalization (lenient + VAL) | 96.5% | 88.7% | 92.4% |
Extraction & Normalization (strict + VAL) | 96.1% | 88.3% | 92.1% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 80.99% | 81.69% | 81.34% |
Extraction (relaxed) | 92.12% | 92.92% | 92.52% |
- Attribute value F1: 73.09%
- Attribute type F1: 84.44%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 86.4% | 84.31% | 85.34% |
Extraction (relaxed) | 93.08% | 90.83% | 91.94% |
- Attribute value F1: 79.56%
- Attribute type F1: 89.66%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 90.83% | 81.44% | 85.88% |
Extraction (relaxed) | 96.33% | 86.38% | 91.08% |
- Attribute value F1: 84.14%
- Attribute type F1: 89.54%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 83.97% | 79.71% | 81.78% |
Extraction (relaxed) | 93.13% | 88.41% | 90.71% |
- Attribute value F1: 78.07%
- Attribute type F1: 83.27%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 91.48% | 80.9% | 85.87% |
Extraction (relaxed) | 96.02% | 84.92% | 90.13% |
- Attribute value F1: 85.33%
- Attribute type F1: 87.47%
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 86.81% | 85.18% | 85.99% |
Extraction (relaxed) | 91.85% | 90.12% | 90.97% |
- Attribute value F1: 73.63%
- Attribute type F1: 82.66%
Precision | Recall | F-Score | Type F1 | Value F1 | |
---|---|---|---|---|---|
Strict extraction/normalization | 85.1% | 79.% | 82.% | 78.5% | 71.% |
Relaxed extraction/normalization | 92.7% | 86.1% | 89.3% | 84.% | 75.% |
Precision | Recall | F-Score | |
---|---|---|---|
Extraction (strict) | 76.98% | 66.9% | 71.59% |
Extraction (relaxed) | 87.3% | 75.86% | 81.18% |
- Attribute value F1: 63.47%
- Attribute type F1: 76.75%