Skip to content

Semantika2/Alksnis-v.3.0

Repository files navigation

Alksnis-v.3.0 (The Lithuanian dependency treebank)

Summary

The Lithuanian dependency treebank ALKSNIS v3.0 (Vytautas Magnus university). From v.2.1 to v.3.0 was developed during the project "Semantika2" (Nr. 02.3.1-CPVA-V-527-01-0002)

Introduction

This is a new corrected and enhanced version of the ALKSNIS Lithuanian treebank. It is annotated in a style derived from the Prague Dependency Treebank of Czech. The previous ALKSNIS v2.1 consists of 2,355 syntactically annotated sentences. Each node of a tree corresponds to a word, a punctuation mark or other text element (symbol, digit etc.) within a sentence. ALKSNIS v.2.1 is published in CLARIN LT repository at http://hdl.handle.net/20.500.11821/10. (Some users experience DNS errors when trying to access the repository; configuring the client machine to use 8.8.8.8 as the DNS server may help. See also http://clarin-lt.lt/?page_id=86.) A version of the MULTEXT-East (http://nl.ijs.si/ME/V4/msd/html/index.html) tag set is used in ALKSNIS v2.1. The following information is presented for each node: 1) a used form; 2) a lemma; 3) a morphology tag, and 4) a syntactic function (subject, object, etc.). Dependencies are shown by links between words. ALKSNIS v3.0 from v2.1 was developed during the Vytautas Magnus University project “Semantika2” (Nr. 02.3.1-CPVA-V-527-01-0002). It consists of 3,643 syntactically annotated sentences.
Modifications from v2.1 to 3.0 (2019-07-08)

  • The older version undergone full review of syntactic information based on improved guidelines to enhance annotation quality.
  • New layer added: non-compositional multiword expressions (light verbs and idioms).
  • Added new data: scientific abstracts and reviews, additional administrative texts.
  • Schema version modified as 3.0.
  • Jablonskis tagset, which is human-friendly, is used instead of MULTEXT-East tagset.
  • Some syntactic relations were corrected or modified (details to be published in the improved guidelines).
  • Conllu files are added together with the pml files (RMQ conllu files does not keep the mwe field).

Content:

  • ALKSNIS-3.0.ZIP - The Lithuanian dependency treebank files.
  • Jablonskis-LT.pdf - Morphological annotation standart used in ALKSNIS.
  • ALksnio-3.0_sandara.docx - the structure of ALKSNIS v.3.0 files

Acknowlegments

From v.2.1 to v.3.0 was developed during the project "Semantika2" (Nr. 02.3.1-CPVA-V-527-01-0002). The Project funded by European Structural Funds

References

For ALKSNIS v.2.1: • Agnė Bielinskienė, Loïc Boizou, Jolanta Kovalevskaitė, Erika Rimkutė (2016): Lithuanian Dependency Treebank ALKSNIS. In: I. Skadiņa and R. Rozis (Eds.): Human Language Technologies – The Baltic Perspective, pp. 107–114. Amsterdam: IOS Press. doi:10.3233/978-1-61499-701-6-107 http://fcim.vdu.lt/~erika_rimkute/straipsniai/Alksnis_HLT.pdf, http://ebooks.iospress.nl/volumearticle/45523

For v.3.0 (2019-10-07):

  • License: CC BY-SA 4.0;
  • Includes text: yes;
  • Genre: news nonfiction legal scientific;
  • Lemmas: manual native;
  • UPOS: converted from manual;
  • XPOS: manual native;
  • Features: converted from manual;
  • Relations: converted from manual;
  • Contributors: Utka, Andrius; Rimkutė, Erika; Bielinskienė, Agnė; Kovalevskaitė, Jolanta; Boizou, Loïc; Aleksandravičiūtė, Gabrielė; Brokaitė, Kristina;
  • Contact: [email protected], [email protected].

About

The Lithuanian dependency treebank

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published