Adding new languages #20

asahi417 · 2023-11-10T11:33:44Z

Here's a thread to add more languages to lmqg as well as https://autoqg.net/ . If you would like to contribute, please comment here with a potential QA dataset we can use to train QAG model on the language. We need at least 10k QA pairs for model training.

eg)
Language: Turkish
Dataset: https://github.com/TQuad/turkish-nlp-qa-dataset
Size: 8308

asahi417 · 2023-11-10T11:35:57Z

Language: Bengali
Dataset: https://huggingface.co/datasets/csebuetnlp/squad_bn
Size: 127,771/2,502/2,504

asahi417 · 2023-11-10T11:36:30Z

Language: Chinese
Dataset: https://github.com/junzeng-pluto/ChineseSquad

asahi417 · 2023-11-13T10:36:44Z

Language: Chinese Dataset: https://github.com/junzeng-pluto/ChineseSquad

Chinese QAG is available on https://autoqg.net/ and lmqg now! With lmqg, you can use it as below.

from lmqg import TransformersQG

model = TransformersQG(language="zh")
context = "与转导或结合不同，转化依赖于大量的细菌基因产物，这些基因产物专门相互作用来完成这个复杂的过程，因此转化显然是细菌对DNA转移的适应。为了使细菌结合、吸收供体DNA并将其重组为自己的染色体，它必须首先进入一种称为能力的特殊生理状态（见自然能力）。在枯草芽孢杆菌中，大约40个基因是培养能力所必需的。枯草芽孢杆菌转化过程中转移的DNA长度可以在染色体的三分之一到整个染色体之间。转化在细菌物种中似乎很常见，到目前为止，已知至少有60种物种具有自然转化能力。自然界能力的发展通常与应激性环境条件有关，似乎是一种促进受体细胞DNA损伤修复的适应。"
model.generate_qa(context)
[('在染色体中发现的DNA长度是多少?', '枯草芽孢杆菌转化过程中转移的DNA长度可以在染色体的三分之一到整个染色体之间。')]

pawanGithub10 · 2023-11-29T06:00:42Z

Language: Hindi
Dataset:(https://github.com/google-deepmind/xquad/blob/master/xquad.hi.json) Please tell me in detail what activities to be done to contribute.

asahi417 · 2023-12-25T05:23:26Z

Language: Hindi Dataset:(https://github.com/google-deepmind/xquad/blob/master/xquad.hi.json) Please tell me in detail what activities to be done to contribute.

This is too small. I checked the dataset and there're 1190 QA pairs in total. Ideally, there should be around 10k pairs, as we are going to train relatively small models (~300M).

asahi417 mentioned this issue Nov 10, 2023

Do you have a plan to support Turkish Language? #17

Closed

asahi417 mentioned this issue Nov 10, 2023

Do you have a plan to support Chinese? #11

Closed

PoleGeogry mentioned this issue Sep 29, 2024

How to change self.max_length #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new languages #20

Adding new languages #20

asahi417 commented Nov 10, 2023 •

edited

Loading

asahi417 commented Nov 10, 2023 •

edited

Loading

asahi417 commented Nov 10, 2023

asahi417 commented Nov 13, 2023 •

edited

Loading

pawanGithub10 commented Nov 29, 2023

asahi417 commented Dec 25, 2023

Adding new languages #20

Adding new languages #20

Comments

asahi417 commented Nov 10, 2023 • edited Loading

asahi417 commented Nov 10, 2023 • edited Loading

asahi417 commented Nov 10, 2023

asahi417 commented Nov 13, 2023 • edited Loading

pawanGithub10 commented Nov 29, 2023

asahi417 commented Dec 25, 2023

asahi417 commented Nov 10, 2023 •

edited

Loading

asahi417 commented Nov 10, 2023 •

edited

Loading

asahi417 commented Nov 13, 2023 •

edited

Loading