This script creates a pandas dataframe and csv file from the U.S. Census Decennial Census API, which offers access to population data by sex, age, race, etc. and housing data by occupancy, vacancy status, and tenure.
In a few quick steps, you'll be querying to your heart's content.
It's easy! And fast! Request yours at census.gov/developers/
You need a csv file of the variables you want to gather. It should look like this:
year | variable | column_name |
---|---|---|
2010 | H0110004 | housing_renter |
2000 | H011003 | housing_renter |
Download the census_variables.csv template.
The template has columns for label and concept, which cut and paste nicely from the Census variable reference pages (links below). Extra columns are ignored by the script. Feel free to delete them! Or add more! The script only uses the first three columns: year, variable and column_name.
Make sure you list the correct year for each variable. The variables change year to year, even for the same data.
In your csv file, provide the header for each column of data. Don't need human-readable headers? Just reuse your variables as your column names.
Add the year to your column names? If the option add_year is True (line 39 of the script), the year will be appended to the column name:
year | variable | column_name | column_name (dataframe) |
---|---|---|---|
2010 | H0110004 | housing_renter | housing_renter_2010 |
2000 | H011003 | housing_renter | housing_renter_2000 |
You can run this script for hundreds of variables! The script will divide your csv file into batches of 50 variables (the API limit) and run multiple requests to gather your data.
You have options! This script can gather data for 4 types of locations:
- 'state' returns data for 50 U.S. States
- 'county' returns data for 3,142 counties in U.S. States
- 'metro' returns data for 685 metropolitan areas (50,000+ population) in the U.S.
- 'metro-micro' returns data for the 685 metros plus 564 micropolitan areas (10,000 - 50,000 population)
Source files and documentation available here: https://github.com/laurakurup/data
You may get a few errors since cities and counties have formed, merged, dissolved, etc. These data files work well for 2010 and 2000 with only a handful of inconsistencies. The script prints errors so you'll see what doesn't work. If you want to query 1990, you may want to find FIPS codes that were accurate for 1990. For more info, see https://www.census.gov/geo/reference/county-changes.html.