Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

prrathi · 2021-05-06T03:22:18Z

The psidhousehold.py and cpshousehold.py files added to the ogusa_calibrate folder contain the scripts that read data- from psid_data_setup.py for PSID and PSL's cps dataset for CPS- and output csv files and images for each one. Per suggestion by @rickecon and @MaxGhenis , the psid.csv and cps.csv files are in their respective folders within ogusa_calibrate/data, and contain ordered by head age and income bracket the average number of people in each age group originally and then after smoothing. The images depict these transformations for each age group of each data type and are outputted to ogusa_calibrate/data/images.

Update 3/15

Returns the smoothed number of <18, 18-64, >64 by age/income

Updating fork

…ps.csv

…foo.txt

codecov-commenter · 2021-05-06T03:33:10Z

Codecov Report

Merging #37 (f70a4f3) into master (93cab3d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #37   +/-   ##
=======================================
  Coverage   63.13%   63.13%           
=======================================
  Files           8        8           
  Lines        1188     1188           
=======================================
  Hits          750      750           
  Misses        438      438

Flag	Coverage Δ
unittests	`63.13% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93cab3d...f70a4f3. Read the comment docs.

MaxGhenis

Thanks @prrathi, here are some other suggestions besides the code suggestion as we went through today:

Run black code formatting
Move csv outputs into a new folder, e.g. ogusa_calibrate/outputs/household_calibration/ and subfolders here for csv and images (though @rickecon and @jdebacker not sure how you want these produced, or even if they should be part of the repo rather than just being functions that could be called to produce them)

MaxGhenis · 2021-05-07T02:45:09Z

ogusa_calibrate/cpshousehold.py

+import microdf as mdf
+import matplotlib.pyplot as plt
+import statsmodels.api as sm
+lowess = sm.nonparametric.lowess


I'd drop this and refer to it directly for clarity (should only need to be referenced once)

MaxGhenis · 2021-05-07T02:47:08Z

ogusa_calibrate/cpshousehold.py

+                           # from taxcalc output.
+                           ])
+
+def add65(age_spouse):


See suggestions in #30 on replacing these functions and lambdas with one-liner vectorized functions

MaxGhenis · 2021-05-07T03:24:22Z

ogusa_calibrate/cpshousehold.py

+cps2.reset_index(inplace = True)
+cps2[cps2['age_head'].between(20,80)]
+
+smoothed18 = []


Suggested replacement for the remainder of the script (see colab):

def smooth(x, y, frac=0.4): """ Produces LOESS smoothed data. """ return pd.Series(lowess(y, x, frac=frac)[:, 1], index=x) def smooth_all(data): """ Return smoothed versions of nu18, n1864, n65. """ return data.groupby(["income_bin", "age_group"]).apply( lambda x: smooth(x.age_head, x.n) ) cps_long = cps4.drop(columns="index").melt(["age_head", "income_bin"], var_name="age_group", value_name="n") smoothed_wide = smooth_all(cps_long).reset_index() smoothed_long = smoothed_wide.melt(["age_group", "income_bin"], var_name="age_head", value_name="n") # Stack with raw. cps_long["smoothed"] = False smoothed_long["smoothed"] = True combined_long = pd.concat([cps_long, smoothed_long]) # Add the household head. NB: age_head starts at 20 so no need to do for nu18. combined_long["add_head"] = ( # n1864 and head age between 18 and 64. ((combined_long.age_group == "n1864") & combined_long.age_head.between(18, 64)) | # n65 and head age exceeds 64. ((combined_long.age_group == "n65") & (combined_long.age_head > 64))) combined_long.n += combined_long.add_head def plot(data): """ Produces and exports a plot of household size by age_head, with lines for each income bin. The title and filename reflect the age group and whether the data is smoothed based on the first record. """ age_group = data.age_group.iloc[0] smoothed = data.smoothed.iloc[0] title = "Average number of people aged " # TODO: Add folder. fname = "cps_" + age_group if age_group == "nu18": title += "0 to 17" elif age_group == "n1864": title += "18 to 64" else: title += "65 or older" if smoothed: title += " (smoothed)" fname += "_smoothed" tmp.pivot_table("n", "age_head", "income_bin").plot() plt.title(title) plt.savefig(fname + ".png") # Create and export all plots. combined_long.groupby(["age_group", "smoothed"]).apply(plot)

I'd stack the PSID data with this too and then add data_source as a groupby everywhere to minimize the code. Then just export combined_long to a csv.

jdebacker · 2021-05-10T16:28:02Z

I would suggest that the csv and image files not a a part of this repo, but it'd be good to share useful images in this discussion.

BTW, here's a study talking about creating tax units from the PSID.

rickecon · 2021-06-05T20:27:55Z

This PR has been superseded by PR #39. Closing.

prrathi added 19 commits March 15, 2021 08:00

Merge pull request #2 from PSLmodels/master

9a6f3c4

Update 3/15

Add files via upload

94fc6df

Returns the smoothed number of <18, 18-64, >64 by age/income

Delete household_structure.py

6f28b15

Add files via upload

3f1a12b

Update household_structure.py

1a04e2c

Merge pull request #3 from PSLmodels/master

b9d18cd

Updating fork

Delete household_structure.py

3552f78

Create foo.txt

de7e1dd

Add files via upload

8111dcb

Rename cps (3).csv to cps.csv

9489953

Delete foo.txt

870ec2e

Add files via upload

340bef5

Rename psid2.csv to psid.csv

9ac4887

Create foo.txt

e4d93fc

Add files via upload

d7f81ec

Rename ogusa_calibrate/data/cps/cps.csv to ogusa_calibrate/data/CPS/c…

c16edf5

…ps.csv

Rename ogusa_calibrate/data/images/foo.txt to ogusa_calibrate/images/…

e6aef50

…foo.txt

Delete foo.txt

1fdba4f

Add files via upload

f70a4f3

prrathi mentioned this pull request May 6, 2021

Adds household_structure.py with smoothed averages of number of <18, 18-64, and 65+ people per family sorted by head age and income bracket #30

Closed

MaxGhenis reviewed May 7, 2021

View reviewed changes

jdebacker mentioned this pull request May 10, 2021

Add Jupyter Book docs #38

Merged

MaxGhenis mentioned this pull request May 22, 2021

Add CPS-based household composition files for UBI #39

Closed

rickecon closed this Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

prrathi commented May 6, 2021

codecov-commenter commented May 6, 2021 •

edited

Loading

MaxGhenis left a comment

MaxGhenis May 7, 2021

MaxGhenis May 7, 2021

MaxGhenis May 7, 2021 •

edited

Loading

jdebacker commented May 10, 2021

rickecon commented Jun 5, 2021

Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

Conversation

prrathi commented May 6, 2021

codecov-commenter commented May 6, 2021 • edited Loading

Codecov Report

MaxGhenis left a comment

Choose a reason for hiding this comment

MaxGhenis May 7, 2021

Choose a reason for hiding this comment

MaxGhenis May 7, 2021

Choose a reason for hiding this comment

MaxGhenis May 7, 2021 • edited Loading

Choose a reason for hiding this comment

jdebacker commented May 10, 2021

rickecon commented Jun 5, 2021

codecov-commenter commented May 6, 2021 •

edited

Loading

MaxGhenis May 7, 2021 •

edited

Loading