In this project, I explore the WordBank open database of children's vocabulary development and growth. The database contains data from 75,000+ kids from 25+ languages. I use machine learning algorithms (i.e., regression trees, random forests, linear regressions) to investigate potential relationships between demographic/linguistic variables (our predictors) and vocabulary growth, as measured by productive vocabulary (our outcome measure) on the The MacArthur-Bates Communicative Development Inventories. All analyses are exploratory in nature and no hypotheses or predictions are made. First, I curate the wordbank dataset, moving to descriptive analyses and visualizations, and finally to the machine learning algorithms.
This repository contains:
- PDF report (knitted from Rmd)
- Rmd script
- R script
- Reference list bibtex
For more information or questions, please e-mail me at [email protected]