Skip to content

Latest commit

 

History

History

daily-news-dikgang

Daily News Dikgang

arXiv

Give Feedback 📑: DSFSI Resource Feedback Form{:target="_blank"}

About dataset

The dataset contains annotated categorised data from Dikgang - Daily News https://dailynews.gov.bw/news-list/srccategory/10. The data is in setswana.

See the Data Statement for foll details.

Disclaimer

This dataset contains machine-readable data extracted from online news articles, from https://dailynews.gov.bw/news-list/srccategory/10, provided by the Botswana Government. While efforts were made to ensure the accuracy and completeness of this data, there may be errors or discrepancies between the original publications and this dataset. No warranties, guarantees or representations are given in relation to the information contained in the dataset. The members of the Data Science for Societal Impact Research Group bear no responsibility and/or liability for any such errors or discrepancies in this dataset. The Botswana Government bears no responsibility and/or liability for any such errors or discrepancies in this dataset. It is recommended that users verify all information contained herein before making decisions based upon this information.

Authors

  • Vukosi Marivate - @vukosi
  • Valencia Wagner

Citation

Bibtex Reference

@inproceedings{marivate2023puoberta,
  title   = {PuoBERTa: Training and evaluation of a curated language model for Setswana},
  author  = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
  year    = {2023},
  booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science},
  url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17},
  keywords = {NLP},
  preprint_url = {https://arxiv.org/abs/2310.09141},
  dataset_url = {https://github.com/dsfsi/PuoBERTa},
  software_url = {https://huggingface.co/dsfsi/PuoBERTa}
}

Licences

The license of the News Categorisation dataset is in CC-BY-SA-4.0. the monolingual data have difference licenses depending on the news website license