HarvardX PH125.9x - Data Science: Capstone

In this project, I explore the WordBank open database of children's vocabulary development and growth. The database contains data from 75,000+ kids from 25+ languages. I use machine learning algorithms (i.e., regression trees, random forests, linear regressions) to investigate potential relationships between demographic/linguistic variables (our predictors) and vocabulary growth, as measured by productive vocabulary (our outcome measure) on the The MacArthur-Bates Communicative Development Inventories. All analyses are exploratory in nature and no hypotheses or predictions are made. First, I curate the wordbank dataset, moving to descriptive analyses and visualizations, and finally to the machine learning algorithms.

This repository contains:

PDF report (knitted from Rmd)
Rmd script
R script
Reference list bibtex

For more information or questions, please e-mail me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
prod_dist.png		prod_dist.png
prod_lang.png		prod_lang.png
r_wordbank_ml.Rmd		r_wordbank_ml.Rmd
r_wordbank_ml.pdf		r_wordbank_ml.pdf
r_wordbank_ml_script.R		r_wordbank_ml_script.R
references.bib		references.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HarvardX PH125.9x - Data Science: Capstone

About

Releases

Packages

Languages

RodDalBen/edx_wordbank

Folders and files

Latest commit

History

Repository files navigation

HarvardX PH125.9x - Data Science: Capstone

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages