This is the git corresponding to Jeremie Bogaert's master thesis. Some of the appendix, judged as useful but too big to be included in the submission are included here. You can find:
-
The LDA html file that helps to find the different topics
-
The database used during the human evaluation in the OriginalText and Generated/ files
-
The url used to collect the news in urlCategory3.txt and urlCategory4.txt
-
The database used during the automated evaluation. The original text are in the databaseNorig.txt, while the generated ones are in the databaseNgen.jsonl. N is the category number, between 1 and 4.
-
The code used to crawl the data via commoncrawl
If the participants to the human experiment allow me to share their answers here, it will also be done in the future.