-
Notifications
You must be signed in to change notification settings - Fork 1
/
nohup5.out
175 lines (175 loc) · 21.6 KB
/
nohup5.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
Started initial training data extraction
Extracting sentences for 6 seed terms
......Process finished with 6 seeds and 2546 sentences added for training in cycle number 0
Starting term expansion
Started to extract generic named entity from sentences...
.....Finished processing sentences with 5260 new possible entities
Started term clustering
Added 511 expanded terms
Starting sentence expansion
Finding similar sentences to the 4101 starting sentences
.....Added 2552 expanded sentences to the 4101 original
Labelling sentences in the required format
6582 lines labelled
Creating property file for Stanford NER training
Training the model...
started extraction for the dataset model, in cycle number 0
815
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................797
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
Total of 1753 filtered entities added
Filtering 1753 entities with PMI
570 entities are kept from the total of 1753
Filtering 1753 entities by term similarity
755 entities are kept from the total of 1753
Filtering 1753 entities with WordNet and Stopwords
966 entities are kept from the total of 1753
Filtering 1753 entities with knowledge base lookup
1095 entities are kept from the total of 1753
Filtering 1753 entities by vote of selected filter methods
283 entities are kept from the total of 1753
Started initial training data extraction
Extracting sentences for 283 seed terms
...........................................................................................................................................................................................................................................................................................Process finished with 283 seeds and 49985 sentences added for training in cycle number 1
Starting term expansion
Started to extract generic named entity from sentences...
....................................................................................................Finished processing sentences with 106263 new possible entities
Started term clustering
Added 7356 expanded terms
Starting sentence expansion
Finding similar sentences to the 76098 starting sentences
.............................................................................Added 28625 expanded sentences to the 76098 original
Labelling sentences in the required format
103758 lines labelled
Creating property file for Stanford NER training
Training the model...
started extraction for the dataset model, in cycle number 1
815
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................797
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
Total of 6834 filtered entities added
Filtering 6834 entities with PMI
1604 entities are kept from the total of 6834
Filtering 6834 entities by term similarity
2008 entities are kept from the total of 6834
Filtering 6834 entities with WordNet and Stopwords
2728 entities are kept from the total of 6834
Filtering 6834 entities with knowledge base lookup
3831 entities are kept from the total of 6834
Filtering 6834 entities by vote of selected filter methods
730 entities are kept from the total of 6834
Started initial training data extraction
Extracting sentences for 730 seed terms
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Process finished with 730 seeds and 149211 sentences added for training in cycle number 2
Starting term expansion
Started to extract generic named entity from sentences...
............................................................................................................................................................................................................................................................................................................Finished processing sentences with 237450 new possible entities
Started term clustering
Added 16495 expanded terms
Starting sentence expansion
Finding similar sentences to the 228396 starting sentences
.....................................................................................................................................................................................................................................Added 66326 expanded sentences to the 228396 original
Labelling sentences in the required format
290413 lines labelled
Creating property file for Stanford NER training
Training the model...
started extraction for the dataset model, in cycle number 2
815
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................797
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
Total of 15863 filtered entities added
Filtering 15863 entities with PMI
3079 entities are kept from the total of 15863
Filtering 15863 entities by term similarity
3722 entities are kept from the total of 15863
Filtering 15863 entities with WordNet and Stopwords
5535 entities are kept from the total of 15863
Filtering 15863 entities with knowledge base lookup
9342 entities are kept from the total of 15863
Filtering 15863 entities by vote of selected filter methods
1235 entities are kept from the total of 15863
Started initial training data extraction
Extracting sentences for 1235 seed terms
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Process finished with 1235 seeds and 252298 sentences added for training in cycle number 3
Starting term expansion
Started to extract generic named entity from sentences...
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Finished processing sentences with 345046 new possible entities
Started term clustering
Added 23944 expanded terms
Starting sentence expansion
Finding similar sentences to the 383629 starting sentences
................................................................................................................................................................................................................................................................................................................................................................................................Added 98390 expanded sentences to the 383629 original
Labelling sentences in the required format
473418 lines labelled
Creating property file for Stanford NER training
Training the model...
started extraction for the dataset model, in cycle number 3
815
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................797
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
Total of 16898 filtered entities added
Filtering 16898 entities with PMI
3171 entities are kept from the total of 16898
Filtering 16898 entities by term similarity
3991 entities are kept from the total of 16898
Filtering 16898 entities with WordNet and Stopwords
6110 entities are kept from the total of 16898
Filtering 16898 entities with knowledge base lookup
9724 entities are kept from the total of 16898
Filtering 16898 entities by vote of selected filter methods
1404 entities are kept from the total of 16898
Started initial training data extraction
Extracting sentences for 1404 seed terms
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Process finished with 1404 seeds and 304113 sentences added for training in cycle number 4
Starting term expansion
Started to extract generic named entity from sentences...
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Finished processing sentences with 398609 new possible entities
Started term clustering
Added 27197 expanded terms
Starting sentence expansion
Finding similar sentences to the 461059 starting sentences
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................Added 112949 expanded sentences to the 461059 original
Labelling sentences in the required format
562987 lines labelled
Creating property file for Stanford NER training
Training the model...
started extraction for the dataset model, in cycle number 4
815
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................797
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
Total of 16857 filtered entities added
/usr/local/lib/python3.5/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use [91mnltk.tag.corenlp.CoreNLPPOSTagger[0m or [91mnltk.tag.corenlp.CoreNLPNERTagger[0m instead.
super(StanfordNERTagger, self).__init__(*args, **kwargs)
/usr/local/lib/python3.5/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use [91mnltk.tag.corenlp.CoreNLPPOSTagger[0m or [91mnltk.tag.corenlp.CoreNLPNERTagger[0m instead.
super(StanfordNERTagger, self).__init__(*args, **kwargs)
/usr/local/lib/python3.5/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use [91mnltk.tag.corenlp.CoreNLPPOSTagger[0m or [91mnltk.tag.corenlp.CoreNLPNERTagger[0m instead.
super(StanfordNERTagger, self).__init__(*args, **kwargs)
/usr/local/lib/python3.5/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use [91mnltk.tag.corenlp.CoreNLPPOSTagger[0m or [91mnltk.tag.corenlp.CoreNLPNERTagger[0m instead.
super(StanfordNERTagger, self).__init__(*args, **kwargs)
/usr/local/lib/python3.5/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use [91mnltk.tag.corenlp.CoreNLPPOSTagger[0m or [91mnltk.tag.corenlp.CoreNLPNERTagger[0m instead.
super(StanfordNERTagger, self).__init__(*args, **kwargs)
Filtering 16857 entities with PMI
3332 entities are kept from the total of 16857
Filtering 16857 entities by term similarity
4095 entities are kept from the total of 16857
Filtering 16857 entities with WordNet and Stopwords
6452 entities are kept from the total of 16857
Filtering 16857 entities with knowledge base lookup
9758 entities are kept from the total of 16857
Filtering 16857 entities by vote of selected filter methods
1417 entities are kept from the total of 16857