Word Deletion Task

Code and data for the paper "Active Use of Latent Constituency Representation in both Humans and Large Language Models". See the paper and the supplementary information

Prerequisites

Python 3.9.12. No non-standard hardware is required.

Install & Getting Started

Clone the repository
Construct a virtual environment for this project

conda env create -f environment.yml
# change the prefix to your own anaconda path, about 30 mins

Download the treebank library to treebank/ptb and treebank/ctb

Notice: Experiment 1 & 2 require the annotation from Penn Treebank and Chinese Treebank. Due to the nonpublic of treebanks, we can not release the annotation file. Experiment 3 & 4 can be replicated without treebanks (see readme file in corresponding directories).

Preparing Treebanks

The preprocessing of treebanks relies on the TreebankProcessing library.

To replicate the preprocessing procedure in our work, you need to download the TreebankProcessing and run:

python ptb.py --output treebank/ptb/extract --task par
python ctb.py --output treebank/ctb/extract --task par

then, run:

python scripts/process_ptb.py --process filter --input_path treebank/ptb/extract --output_path treebank/ptb/processed
python scripts/process_ctb.py --process filter --input_path treebank/ctb/extract --output_path treebank/ptb/processed

Run Experiment for ChatGPT

Run run_gpt.py in each directory

Exp 1&2: Analysis Constituent Rate & Explaiend Ratio of Rules

Follow and run the code in analysis.ipynb in each directory (including visualization)

Exp 1&2: Naive LSTM

All experiments about LSTM are in lstm directory

Exp 3: Tree Reconstruction

To reconstruct a tree based on the deletion task, please switch to the exp3 directory.

To reconstruct trees of the sentences employed in our study, please refer to exp3/analysis.ipynb.

If you want to reconstruct your own sentence, please provide your API key:

python run_chatgpt.py --sentence 'your own sentence' --output_path ./output
python tree_reconstruction.py --sentence 'your own sentence' --response ./output/response.csv

This script would construct multiple tests for your sentence, run these tests on ChatGPT, and reconstruct the tree.

Exp 4: Syntactically Ambiguous Sentence

The syntactically ambiguous sentences with adjunct or PP attachment is provided in the exp4/stimulus directory.

Citation

If you make use of the code in this repository, please cite the following papers:

@article{liu2024active,
  title={Active Use of Latent Constituency Representation in both Humans and Large Language Models},
  author={Liu, Wei and Xiang, Ming and Ding, Nai},
  journal={arXiv preprint arXiv:2405.18241},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
exp1		exp1
exp2		exp2
exp3		exp3
exp4		exp4
lstm		lstm
scripts		scripts
treebank		treebank
LICENSE		LICENSE
environment.yaml		environment.yaml
readme.md		readme.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Deletion Task

Prerequisites

Install & Getting Started

Preparing Treebanks

Run Experiment for ChatGPT

Exp 1&2: Analysis Constituent Rate & Explaiend Ratio of Rules

Exp 1&2: Naive LSTM

Exp 3: Tree Reconstruction

Exp 4: Syntactically Ambiguous Sentence

Citation

About

Releases

Packages

Languages

License

y1ny/WordDeletion

Folders and files

Latest commit

History

Repository files navigation

Word Deletion Task

Prerequisites

Install & Getting Started

Preparing Treebanks

Run Experiment for ChatGPT

Exp 1&2: Analysis Constituent Rate & Explaiend Ratio of Rules

Exp 1&2: Naive LSTM

Exp 3: Tree Reconstruction

Exp 4: Syntactically Ambiguous Sentence

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages