Semantic Shift & Age of Acquisition

This repository contains code detailing the analyses performed for the CogSci paper 'Words with consistent diachronic usage patterns are learned earlier. A computational analysis using temporally aligned word embeddings'.

Reference

@article{https://doi.org/10.1111/cogs.12963,
author = {Cassani, Giovanni and Bianchi, Federico and Marelli, Marco},
title = {Words with Consistent Diachronic Usage Patterns are Learned Earlier: A Computational Analysis Using Temporally Aligned Word Embeddings},
journal = {Cognitive Science},
volume = {45},
number = {4},
pages = {e12963},
keywords = {Age of acquisition, Language change, Temporally aligned word embeddings, Computational psycholinguistics},
doi = {https://doi.org/10.1111/cogs.12963},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12963},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/cogs.12963},
year = {2021}
}

Content

The project code has been separated in three different folders: All the necessary datasets are provided in the folder data/, whereas the folder src/ contains Python code to read relevant resources to compute OLD20 for the target words as well as R code to preprocess raw data to generate input files for the statistical analyses, run linear models and generate the plots included in the paper. Finally, the code to compute the semantic change measures is available under the measures folder.

Temporal Embeddings with A Compass

To create the aligned embeddings, it is necessary to obtain the CoHA corpus. Then, the TWEC embedding alignment algorithm can be used to aling the slices. It is enough to split the COHA data in 5 sets: 1800-1840, 1840-1880, 1880-1920, 1920-1960, 1960-2000. You should manually pre-process the text before using TWEC (we used spacy to do this).

Follow the instruction on the TWEC to install the tool.

Requirements

pandas
scipy
numpy
cade (refer to the TWEC page)
old20 (https://github.com/stephantul/old20)
SUBTLEX-US (https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus2.zip)
Age of Acquisition norms by Kuperman et al (http://crr.ugent.be/archives/806)
concreteness norms by Brysbaert et al (http://crr.ugent.be/papers/Concreteness_ratings_Brysbaert_et_al_BRM.txt)

Authors

Giovanni Cassani, Tilburg University
Federico Bianchi, Bocconi University
Marco Marelli, University of Milano-Bicocca

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
measures		measures
py_src		py_src
src		src
.gitignore		.gitignore
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Shift & Age of Acquisition

Reference

Content

Temporal Embeddings with A Compass

Requirements

Authors

About

Releases

Packages

Contributors 2

Languages

GiovanniCassani/semanticShift_AoA

Folders and files

Latest commit

History

Repository files navigation

Semantic Shift & Age of Acquisition

Reference

Content

Temporal Embeddings with A Compass

Requirements

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages