This repository contains code detailing the analyses performed for the CogSci paper 'Words with consistent diachronic usage patterns are learned earlier. A computational analysis using temporally aligned word embeddings'.
@article{https://doi.org/10.1111/cogs.12963, author = {Cassani, Giovanni and Bianchi, Federico and Marelli, Marco}, title = {Words with Consistent Diachronic Usage Patterns are Learned Earlier: A Computational Analysis Using Temporally Aligned Word Embeddings}, journal = {Cognitive Science}, volume = {45}, number = {4}, pages = {e12963}, keywords = {Age of acquisition, Language change, Temporally aligned word embeddings, Computational psycholinguistics}, doi = {https://doi.org/10.1111/cogs.12963}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12963}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/cogs.12963}, year = {2021} }
The project code has been separated in three different folders: All the necessary datasets are provided in the folder data/, whereas the folder src/ contains Python code to read relevant resources to compute OLD20 for the target words as well as R code to preprocess raw data to generate input files for the statistical analyses, run linear models and generate the plots included in the paper. Finally, the code to compute the semantic change measures is available under the measures folder.
To create the aligned embeddings, it is necessary to obtain the CoHA corpus. Then, the TWEC embedding alignment algorithm can be used to aling the slices. It is enough to split the COHA data in 5 sets: 1800-1840, 1840-1880, 1880-1920, 1920-1960, 1960-2000. You should manually pre-process the text before using TWEC (we used spacy to do this).
Follow the instruction on the TWEC to install the tool.
- pandas
- scipy
- numpy
- cade (refer to the TWEC page)
- old20 (https://github.com/stephantul/old20)
- SUBTLEX-US (https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus2.zip)
- Age of Acquisition norms by Kuperman et al (http://crr.ugent.be/archives/806)
- concreteness norms by Brysbaert et al (http://crr.ugent.be/papers/Concreteness_ratings_Brysbaert_et_al_BRM.txt)
- Giovanni Cassani, Tilburg University
- Federico Bianchi, Bocconi University
- Marco Marelli, University of Milano-Bicocca