Skip to content

Latest commit

 

History

History
97 lines (58 loc) · 2.86 KB

pos_tagging.md

File metadata and controls

97 lines (58 loc) · 2.86 KB

Part-of-Speech (POS) tagging

Background

Part-of-speech tagging is the task of assigning a part-of-speech tag (from a given tag set) to every word in a given sentence.

Example

Input:

快速 的 棕色 狐狸 跳过 了 懒惰 的 狗

Output:

[快速] VA [的] DEC [棕色] NN [狐狸] NN [跳过] VV [了] AS [懒惰] VA [的] DEC [狗] NN

Standard Metrics

F1 score calculated from word-level precision and word-level recall computed from the joint segmentation and tagging task.

Chinese Tree Bank Datasets.

Test set # words (dev) # words (test) Genre
CTB5 6,821 8,008 News

Metrics

Results

System F1 score
Tian el. al. (2020) 96.92
Meng et. al. (2019) (Glyce + BERT) 96.61
Meng et. al. (2019) (BERT) 96.06
Shao et. al. 2017 94.38

Resources

Train set # words Genre
CTB5 493,935 News

Universal Dependencies Datasets.

Test set # words (dev) # words (test) Genre
UD Chinese 12,663 12,012 Learner essays, news, spoken language, Wiki

Metrics

Results

System F1 score
Meng et. al. (2019) (Glyce + BERT) 96.14
Tian el. al. (2020) 95.69
Meng et. al. (2019) (BERT) 94.79
Shao et. al. (2017) 89.75

Resources

Train set # words Genre
UD Chinese 98,608 Learner essays, news, spoken language, Wiki

Suggestions? Changes? Please send email to [email protected]