-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new languages #20
Comments
Language: Bengali |
Language: Chinese |
Chinese QAG is available on https://autoqg.net/ and lmqg now! With lmqg, you can use it as below. from lmqg import TransformersQG
model = TransformersQG(language="zh")
context = "与转导或结合不同,转化依赖于大量的细菌基因产物,这些基因产物专门相互作用来完成这个复杂的过程,因此转化显然是细菌对DNA转移的适应。为了使细菌结合、吸收供体DNA并将其重组为自己的染色体,它必须首先进入一种称为能力的特殊生理状态(见自然能力)。在枯草芽孢杆菌中,大约40个基因是培养能力所必需的。枯草芽孢杆菌转化过程中转移的DNA长度可以在染色体的三分之一到整个染色体之间。转化在细菌物种中似乎很常见,到目前为止,已知至少有60种物种具有自然转化能力。自然界能力的发展通常与应激性环境条件有关,似乎是一种促进受体细胞DNA损伤修复的适应。"
model.generate_qa(context)
[('在染色体中发现的DNA长度是多少?', '枯草芽孢杆菌转化过程中转移的DNA长度可以在染色体的三分之一到整个染色体之间。')] |
Language: Hindi |
This is too small. I checked the dataset and there're 1190 QA pairs in total. Ideally, there should be around 10k pairs, as we are going to train relatively small models (~300M). |
Here's a thread to add more languages to lmqg as well as https://autoqg.net/ . If you would like to contribute, please comment here with a potential QA dataset we can use to train QAG model on the language. We need at least 10k QA pairs for model training.
eg)
Language: Turkish
Dataset: https://github.com/TQuad/turkish-nlp-qa-dataset
Size: 8308
The text was updated successfully, but these errors were encountered: