-
Notifications
You must be signed in to change notification settings - Fork 0
/
model_establish.txt
30 lines (21 loc) · 1.06 KB
/
model_establish.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Techniques to use to establish baseline
--------------------------------------------------------------------------------
1. Word Frequency Analysis
~> TFIDF
2. Frequency based Single Document Keywork extraction
3. Content sensitive Single Document Keywork Extraction
It matches and surpasses TF-IDF
Lexical Graph
Bayes Classifier with TF-IDF
--------------------------------------------------------------------------------
Method described by Xiao-yu Jiang, in A Keyword Extraction Method Based on
Lexical Chains
@ http://download.springer.com/static/pdf/524/chp%253A10.1007%252F978-3-540-92137-0_40.pdf?auth66=1414260742_a37eb16bbe53dea943a4202402153af7&ext=.pdf
--> He has used it mainly for Chinese characters , where there is no boundary
between two characters .
Precision = |A ∩ B| / |A|
Recall = |A ∩ B| / |B|
where |A| is the number of keywords extracted automatically,
|B| is number of keyworkds present in test data,
|A ∩ B| is the size of intersection between A and B
--------------------------------------------------------------------------------