The repository contains dataset of over 75 000 polish Wikipedia pages assigned to specific science fields and links between these pages. Dataset can be use as simple classification task in NLP, especially as benchmark for graph based methods.
Articles information file. Columns:
- title - article title,
- text - article text,
- category - one of 7 main Wikipiedia categories related with science fields that was the closest to article categories in scrapped categories tree.
Articles categories:
- Astronomia - astronomy,
- Biologia - biology,
- Matematyka - math,
- Psychologia - psychology,
- Fizyka - physics,
- Informatyka - computer science,
- Chemia - chemistry.
File with links between pages. First column is source article title and second column is target article title. Take a note that file includes links to pages that are not present in wiki_pages.csv.