This repository holds the code for the project for Natural Language Processing (S2023). It contains the script of the analysis.
thought | original_label |
---|---|
Someone I trusted stole something valuable of mine, I was extremely angry and wanted justice | emotional reasoning |
She doesn't respect me. | overgeneralizing |
Total | 921 |
The code was developed and tested on a MacBook Pro with macOS (Sonoma 14.1.2, python v3.9.6).
To reproduce the results , follow the steps below. All terminal commands should be run from the root directory of the repository.
- Clone the repository
- Create a virtual environment and install requirements
bash setup.sh
Think about replacing the different path with your own repertoire, which contains "reframing_dataset.csv" and "thinking_traps.jsonl"
- Run the
preprocessing.ipybn
script to:- merge the two dataset
- some preprocessing
- Run the
analysis.ipybn
script to:- Do the embedding with sentence-transformer
- reduction of dimension with UMAP
- Clustering with k-mean and hdbscan
- export of results
- Run the
plot.ipybn
script to: - plot the different clusters - Run the
label.ipybn
script to: - generate label - Run the
dashboard.ipybn
script to: - generate dashboard
├── code
│ ├── analysis.ipynb
│ ├── clean.py
│ ├── clust.py
│ ├── dashboard.ipynb
│ ├── label.ipynb
│ ├── label.py
│ ├── plot.ipynb
│ ├── preprocessing.ipynb
│ ├── result.ipynb
├── env <- Not included in repo
├── data
│ ├── corpus_disto.csv
│ ├── corpus_embedding.csv
│ ├── corpus_hdbscan_bayesian_optimisation.csv
│ ├── corpus_kmean.csv
│ ├── label_hdbscan_all-miniLM.csv
│ ├── label_hdbscan_roberta.csv
│ ├── reframing_dataset.csv
│ ├── thinking_traps.jsonl
├── doc
│ ├──
│ ├──
│ ├──
│ └──
├── export # all the image and html interactive plot
├── .gitignore
├── README.md
├── dash_deploy.py # script used for the website dashboard
├── README.md
├── setup.sh
├── requirements.txt
To display the results of the exploratory approach of cognitive distortion follow the following links:
These can also be found in the export
folder of the repository.