GujTB is an in-progress treebank of Gujarati (an Indo-Aryan language) in Gujarati script.
Currently the treebank is comprised of 187 sentences, out of which 100 are doubly annotated by the authors. We plan to update the treebank with proper morphological annotations and features in the upcoming release.
Please cite the following paper if you use this treebank in your research:
@inproceedings{jobanputra-etal-2024-universal,
title = "A {U}niversal {D}ependencies Treebank for {G}ujarati",
author = {Jobanputra, Mayank and
Mehta, Maitrey and
{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i}},
editor = {Bhatia, Archna and
Bouma, Gosse and
Do{\u{g}}ru{\"o}z, A. Seza and
Evang, Kilian and
Garcia, Marcos and
Giouli, Voula and
Han, Lifeng and
Nivre, Joakim and
Rademaker, Alexandre},
booktitle = "Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.mwe-1.9",
pages = "56--62",
abstract = "The Universal Dependencies (UD) project has presented itself as a valuable platform to develop various resources for the languages of the world. We present and release a sample treebank for the Indo-Aryan language of Gujarati {--} a widely spoken language with little linguistic resources. This treebank is the first labeled dataset for dependency parsing in the language and the script (the Gujarati script). The treebank contains 187 part-of-speech and dependency annotated sentences from diverse genres. We discuss various idiosyncratic examples, annotation choices and present an elaborate corpus along with agreement statistics. We see this work as a valuable resource and a stepping stone for research in Gujarati Computational Linguistics.",
}
- 2024-05-15 v2.14
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.14 License: CC BY-SA 4.0 Includes text: yes Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Mehta, Maitrey; Jobanputra, Mayank Contributing: here Contact: [email protected] ===============================================================================