Skip to content

SinaLab/Synonyms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Synonyms Dataset

The dataset is a set of 500 synsets (extracted from the Arabic Wordnet). Each synset is enriched with a list of candidate synonyms. The total number is 3K candidates. Each candidate synonym is then annotated with a fuzzy value by four linguists (in parallel). The dataset is important for understanding how much linguists (dis/)agree on synonymy (which we found RMSE: 32% and MAE: 27%). In addition, we used the dataset as a baseline to evaluate our algorithm. See the scoring guidelines, figures, and details in the paper section 3.

License: CC-BY-4.0

Online Demo

You can try our algorithm using the demo link below
https://sina.birzeit.edu/synonyms/

Credits

This research is funded by the Research Committee at Birzeit University (No. 2021/49)

Citation

Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas: A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms. In Proceedings of the Global WordNet Conference (gwc2023), Donostia, January. 2023

@inproceedings{GJJB23,
    title = {A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms},
    author = {Sana Ghanem and Mustafa Jarrar and Radi Jarrar and Ibrahim Bounhas},
    isbn = {978-9-464027-31-0},
    booktitle = {Proceedings of the 12th International Global Wordnet Conference (GWC2023)},
    pages = {},
    location = {San Sebastian},
    month = {Jan},
    year = {2023},
    publisher = {Global Wordnet Association},
    keywords ={Synonyms, Synset, WordNet, Dictionary, Arabic, Multilingual lexicons, Online dictionary, Language resources, Lexical semantics, NLP},
    abstract = {Synonymy relationships are used in many NLP tasks and knowledge organization systems. However, automatic synonym extraction is a challenging task, especially for low-resourced and highly ambiguous languages such as Arabic. This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. In other words, given a mono/multilingual synset and a threshold (a fuzzy value $[0-1]$), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations (using the one-way ANOVA test) show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE).},
    url={http://www.jarrar.info/publications/GJJB23.pdf}
}

Contacts

Mustafa Jarrar: Linkedin | Twitter | Github | Mail

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •