Skip to content

Commit

Permalink
Add Isna Persian Dataset (#3631)
Browse files Browse the repository at this point in the history
The level of importance of this data is less than Wikipedia. So, I think
[this pull
request](#3629) should be
merged first.

I have uploaded the data to
[huggingface](https://huggingface.co/datasets/pourmand1376/isna-news)
according to Open-assistant's standard. So, it shouldn't need any
processing.

---------

Co-authored-by: Oliver Stanley <[email protected]>
  • Loading branch information
pourmand1376 and olliestanley authored Aug 3, 2023
1 parent 65f5c2b commit 0d4adb5
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 0 deletions.
1 change: 1 addition & 0 deletions data/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"tv_dialogue": "sedthh/tv_dialogue", # TV and Movie dialogues and transcripts
"fd_dialogue": "sedthh/fd_dialogue", # TV and Movie dialogues and transcripts from ForeverDreaming
"tlcv2.0_oa": "pythainlp/tlcv2.0_oa", # Thai classical literature texts
"fa-isna-news": "pourmand1376/isna-news", # Isna Persian News
"fa-wikipedia": "pourmand1376/fa-wikipedia", # Farsi Wikipedia texts
}

Expand Down
2 changes: 2 additions & 0 deletions data/datasets/fa-isna-news/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This text-only dataset is crawled from [Isna news](https://isna.ir/). This is
biggest farsi news agency and thus the text is pretty clean.

0 comments on commit 0d4adb5

Please sign in to comment.