NLP Project: Data-Driven taxonomy for cognitive distortion

This repository holds the code for the project for Natural Language Processing (S2023). It contains the script of the analysis.

⏭ Workflows

Description of the dataset

thought	original_label
Someone I trusted stole something valuable of mine, I was extremely angry and wanted justice	emotional reasoning
She doesn't respect me.	overgeneralizing
Total	921

Usage and reproducibility

The code was developed and tested on a MacBook Pro with macOS (Sonoma 14.1.2, python v3.9.6).

To reproduce the results , follow the steps below. All terminal commands should be run from the root directory of the repository.

Clone the repository
Create a virtual environment and install requirements

bash setup.sh

Think about replacing the different path with your own repertoire, which contains "reframing_dataset.csv" and "thinking_traps.jsonl"

Run the preprocessing.ipybn script to:
- merge the two dataset
- some preprocessing
Run the analysis.ipybn script to:
- Do the embedding with sentence-transformer
- reduction of dimension with UMAP
- Clustering with k-mean and hdbscan
- export of results
Run the plot.ipybn script to: - plot the different clusters
Run the label.ipybn script to: - generate label
Run the dashboard.ipybn script to: - generate dashboard

Repository structure

├── code 
│   ├── analysis.ipynb
│   ├── clean.py
│   ├── clust.py
│   ├── dashboard.ipynb
│   ├── label.ipynb
│   ├── label.py
│   ├── plot.ipynb
│   ├── preprocessing.ipynb
│   ├── result.ipynb
├── env                                         <- Not included in repo
├── data
│   ├── corpus_disto.csv
│   ├── corpus_embedding.csv
│   ├── corpus_hdbscan_bayesian_optimisation.csv
│   ├── corpus_kmean.csv
│   ├── label_hdbscan_all-miniLM.csv
│   ├── label_hdbscan_roberta.csv
│   ├── reframing_dataset.csv
│   ├── thinking_traps.jsonl
├── doc                                   
│   ├──
│   ├── 
│   ├── 
│   └──
├── export   # all the image and html interactive plot                                 
├── .gitignore
├── README.md
├── dash_deploy.py # script used for the website dashboard
├── README.md
├── setup.sh 
├── requirements.txt

Results

To display the results of the exploratory approach of cognitive distortion follow the following links:

dashboard

These can also be found in the export folder of the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Project: Data-Driven taxonomy for cognitive distortion

⏭ Workflows

Description of the dataset

Usage and reproducibility

Repository structure

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
code		code
data		data
doc		doc
export		export
.DS_Store		.DS_Store
README.md		README.md
dash_deploy.py		dash_deploy.py
requirements.txt		requirements.txt
setup.sh		setup.sh

SylvainEstebe/cognitive_distortion_project

Folders and files

Latest commit

History

Repository files navigation

NLP Project: Data-Driven taxonomy for cognitive distortion

⏭ Workflows

Description of the dataset

Usage and reproducibility

Repository structure

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages