Pretrained-Models-For-Khmer

Introduction

In this work, we present transformer-based PTMs for the Khmer language for the first time. We evaluate our models on two downstream tasks: part-of-speech tagging and news categorization; the dataset for the latter task is self-constructed. In addition, we find that the current Khmer word segmentation technology does not aid performance improvement. For more details of our dataset or our models, please see our paper “Pre-trained Models and Evaluation Data for the Khmer Language”.

Citation

If you use our models or our dataset, please consider citing our paper:

@article{,
author="Jiang, Shengyi
and Fu, Sihui
and Lin, Nankai
and Fu, Yingwen",
title="Pre-trained Models and Evaluation Data for the Khmer Language",
year="2021",
publisher="Tsinghua Science and Technology",

}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NewsDataset.zip		NewsDataset.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pretrained-Models-For-Khmer

Introduction

Citation

About

Releases

Packages

GKLMIP/Pretrained-Models-For-Khmer

Folders and files

Latest commit

History

Repository files navigation

Pretrained-Models-For-Khmer

Introduction

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages