Code for the paper Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Dependencies:
- numpy
- keras
- theano (though the TensorFlow backend is likely to also work)
- pandas
For the Google LM evaluation, you would need to install TensorFlow and download the trained model.
If you're just looking for the subject-verb dependency data in a simple format and are not planning to run the code in this repository, download our simple dependency dataset.
Follow this section if you'd like to run the code on the same set of dependencies we used in the paper.
All of the functions should accept all relevant filenames as arguments, but in
general the easiest thing to do is to set the environment variable
RNN_AGREEMENT_ROOT
to wherever you cloned this repository.
After cloning the repository, download the dependency
dataset
into the data
subdirectory and unzip the file.
from rnnagr.agreement_acceptor import PredictVerbNumber
from rnnagr import filenames
pvn = PredictVerbNumber(filenames.deps, prop_train=0.1)
pvn.pipeline()
After running this code, pvn.test_results
will be a pandas data frame
with all of the relevant results.
The file experiments.py
contains the code used to run the experiments
reported in the paper; see that file for examples of tasks and training
regimes other than PredictVerbNumber
.
If you'd like to regenerate the set of dependencies from the corpus (and perhaps modify our criteria), download the subset of the parsed Wikipedia corpus we used (1.7 GB).
from rnnagr.collect_agreement import CollectAgreement
from rnnagr.utils import deps_to_csv
import filenames
ca = CollectAgreement(filenames.parsed_wiki_subset_50, modes=('infreq_pos',),
skip=0, most_common=10000)
ca.collect_agreement()
deps_to_tsv(ca.deps, 'agr_mostcommon_10K.tsv')
If you use our data or code for academic work, please cite:
@article{linzen2016assessing,
Author = {Linzen, Tal and Dupoux, Emmanuel and Goldberg, Yoav},
Journal = {Transactions of the Association for Computational Linguistics},
Title = {Assessing the ability of {LSTMs} to learn syntax-sensitive dependencies},
Volume = {4},
Pages = {521--535},
Year = {2016}
}