New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adds household_structure.py with smoothed averages of number of <18, 18-64, and 65+ people per family sorted by head age and income bracket #30

Closed

prrathi wants to merge 19 commits into PSLmodels:master from prrathi:master

Contributor

prrathi commented Mar 15, 2021

This code first determines the average number of people <18, 18-64, and 65+ per household by head age and income bracket, then uses a KDE for smoothing. The outputs of the code are currently saved as numpy arrays titled nu18, n1864, and n65. For example, nu18 is a two dimensional array with the rows being different head ages from 20 to 80 and the columns being the different income brackets used throughout OG-USA, and the value of each cell is the smoothed average number of people under 18 in a household with that head age and income bracket.

Per line 16 in the code, the distribution, particularly the number of people aged between 18 and 64, depends on the new variable num_family that is pulled into the R dataset. Because all the variables from R dataset go through and are saved by the psid_data_setup.py outputted pickle, it would be included in this data. For testing, however, I assumed there to be 4 people in every house, so line 16 read panel_li.insert(len(panel_li.columns),"num_family",4). Here are the results of the smoothing for each of <18, 18-64, and 65+ using this testing: Results.zip. There were definitely some irregular results, here are a few observations for the nu18 array:

142 of the 7*60=420 total results had a magnitude of difference between the smooth and actual values greater than 1
15 of the smoothed values had really extreme values that were at least 5 more than the averages from data- like 7 or 8 people under 18 for households of that type
there were significantly more values close to 0 than expected, so overall a lot more extremity with smoothing

Again the nu18 array wasn't affected by the assumed 4 people per household, this was only the product of smoothing. On the topic of smoothing, I used the same function that was suggested to be consolidated in #25. Look forward to everyone's thoughts!

prrathi added 5 commits

March 15, 2021 08:00


          Merge pull request #2 from PSLmodels/master

9a6f3c4

Update 3/15


          Add files via upload

94fc6df

Returns the smoothed number of <18, 18-64, >64 by age/income


          Delete household_structure.py

6f28b15


          Add files via upload

3f1a12b


          Update household_structure.py

1a04e2c

prrathi changed the title ~~Adds household_structure.py outputting the smoothed averages of number of <18, 18-64, and 65+ year olds per family sorted by head age and income bracket~~ Adds household_structure.py with smoothed averages of number of <18, 18-64, and 65+ people per family sorted by head age and income bracket

MaxGhenis reviewed

View reviewed changes

Contributor

MaxGhenis left a comment

Thanks @prrathi, left some suggestions to simplify the code, and if you can drop the files in as csv's that'd help diagnose the issues you brought up.

I think a Jupyter notebook visualizing the raw and smoothed values would also be helpful. For example, a plot of nu18 by head_age, with dots for actuals and a line for the smoothed value, and the same by lifetime income bucket.

ogusa_calibrate/household_structure.py Outdated

+                if(spouse_age>=65):
+                  count += 1
+                return count #assumes only head or spouse of head can be 65+
+              panel_li['n65'] = panel_li.apply(lambda x: add65(x['head_age'],x['spouse_age']), axis=1)

Contributor

MaxGhenis Mar 15, 2021

Suggested change

      
            panel_li['n65'] = panel_li.apply(lambda x: add65(x['head_age'],x['spouse_age']), axis=1)
          
            panel_li['n65'] = np.where(panel_li.head_age > 64, 1, 0) + np.where(panel_li.spouse_age > 64, 1, 0)

can replace the add65 function

ogusa_calibrate/household_structure.py Outdated


		panel_li = pickle.load(open('psid_lifetime_income.pkl', 'rb')) #created by psid_data_setup.py

		panel_li.insert(len(panel_li.columns),"weight",1) #create column of only 1s which is used as weights for taking microdf average

Contributor

MaxGhenis Mar 15, 2021 •

edited

Loading

microdf doesn't require weights, you can leave the weights argument in any function empty to have it be unweighted (or for this file, skip the microdf import as it's unnecessary)

ogusa_calibrate/household_structure.py Outdated

		@@ -0,0 +1,146 @@

Contributor

MaxGhenis Mar 15, 2021

remove these empty lines

ogusa_calibrate/household_structure.py Outdated

+                return count
+              panel_li['n1864'] = panel_li.apply(lambda x: add1864(x['head_age'],x['spouse_age'],x['num_family'],x['num_children_under18']), axis=1)
+              panel_li['nu18'] = panel_li.apply(lambda x: x['num_children_under18'], axis=1) #assumes only children can be <18

Contributor

MaxGhenis Mar 15, 2021

Suggested change

      
            panel_li['nu18'] = panel_li.apply(lambda x: x['num_children_under18'], axis=1) #assumes only children can be <18
          
            panel_li['nu18'] = panel_li.num_children_under18

ogusa_calibrate/household_structure.py Outdated

+                if(spouse_age>=65):
+                  count -= 1
+                return count
+              panel_li['n1864'] = panel_li.apply(lambda x: add1864(x['head_age'],x['spouse_age'],x['num_family'],x['num_children_under18']), axis=1)

Contributor

MaxGhenis Mar 15, 2021

Suggested change

      
            panel_li['n1864'] = panel_li.apply(lambda x: add1864(x['head_age'],x['spouse_age'],x['num_family'],x['num_children_under18']), axis=1)
          
            panel_li['n1864'] = panel_li.num_family - panel_li.n65 - panel_li.nu18

after moving the nu18 line above this, or just use num_children_under18

ogusa_calibrate/household_structure.py Outdated


		panel_li['nu18'] = panel_li.apply(lambda x: x['num_children_under18'], axis=1) #assumes only children can be <18

		panel_li2 = panel_li.reset_index()

Contributor

MaxGhenis Mar 16, 2021

Suggested change

      
            panel_li2 = panel_li.reset_index()
          
            panel_li.reset_index(inplace=True)
          
            panel_li_20_80 = panel_li[panel_li.head_age.isbetween(20, 80)]

replacing below lines too

ogusa_calibrate/household_structure.py Outdated

+              panel_li2 = panel_li.reset_index()
+              panel_li3 = panel_li2[panel_li2['head_age'] <= 80]
+              panel_li3 = panel_li3[panel_li3['head_age'] >= 20]
+              panel_li4 = panel_li3.groupby(['head_age', 'li_group']).apply(

Contributor

MaxGhenis Mar 16, 2021

Suggested change

      
            panel_li4 = panel_li3.groupby(['head_age', 'li_group']).apply(
          
            panel_li_group = panel_li_20_80.groupby(['head_age', 'li_group'])[["nu18", "n1864", "n65"]].mean()

This takes care of much of the below code. Don't need microdf since nothing's weighted.

ogusa_calibrate/household_structure.py Outdated

+              result65 = MVKDE(60, 7, temp65)
+              result65 = result65*panelFinal3
+              np.save('nu18', result18)

Contributor

MaxGhenis Mar 16, 2021

Could you include these files, as well as temp18 etc., in the PR?

Contributor Author

prrathi commented Mar 16, 2021 •

edited

Loading

@MaxGhenis Thanks for the edits will go through those...here's the average pre and post smoothing values for each of the age groups again by head age and income bracket as cvs- for examplenu18init is pre and nu18final is post. I think the visualizations for comparing is a good idea, can work on that.
distributions.zip

prrathi added 2 commits

May 2, 2021 17:35


          Merge pull request #3 from PSLmodels/master

b9d18cd

Updating fork


          Delete household_structure.py

3552f78

codecov-commenter commented May 2, 2021 •

edited

Loading

Codecov Report

Merging #30 (1fdba4f) into master (93cab3d) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head 1fdba4f differs from pull request most recent head f70a4f3. Consider uploading reports for the commit f70a4f3 to get more accurate results

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
- Coverage   63.13%   63.04%   -0.09%     
==========================================
  Files           8        8              
  Lines        1188     1188              
==========================================
- Hits          750      749       -1     
- Misses        438      439       +1

Flag	Coverage Δ
unittests	`63.04% <ø> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ogusa_calibrate/tests/test_txfunc.py	`53.50% <0.00%> (-0.44%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93cab3d...f70a4f3. Read the comment docs.

prrathi added 12 commits

May 3, 2021 18:03


          Create foo.txt

de7e1dd


          Add files via upload

8111dcb


          Rename cps (3).csv to cps.csv


          Delete foo.txt

870ec2e


          Add files via upload

340bef5


          Rename psid2.csv to psid.csv

9ac4887


          Create foo.txt

e4d93fc


          Add files via upload

d7f81ec


          Rename ogusa_calibrate/data/cps/cps.csv to ogusa_calibrate/data/CPS/c…

c16edf5

…ps.csv


          Rename ogusa_calibrate/data/images/foo.txt to ogusa_calibrate/images/…

e6aef50

…foo.txt


          Delete foo.txt

1fdba4f


          Add files via upload

f70a4f3

Contributor Author

prrathi commented May 6, 2021 •

edited

Loading

Moving this to PR #37

prrathi closed this

MaxGhenis mentioned this pull request

Adds number of <18, 18-64, >64 people per household sorted by head age and income bracket, using PSID and CPS data #37

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet