Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it valid to aggregate taxcalc data to households? #1961

Closed
MaxGhenis opened this issue Apr 8, 2018 · 6 comments
Closed

Is it valid to aggregate taxcalc data to households? #1961

MaxGhenis opened this issue Apr 8, 2018 · 6 comments
Labels

Comments

@MaxGhenis
Copy link
Contributor

I'm looking to aggregate taxcalc data to the household level using h_seq, and am having trouble thinking through what the appropriate weight would be for each household record. It seems to me that the variance in s006 across tax units within a household means that there's no single weight that can be assigned to households to produce the same aggregates as tax unit totals. Instead the ideal way might be linear programming to match tax-unit-level totals, or administrative totals directly.

In this notebook I tried two other simple approaches that I'd expect would produce similar ballpark weights:

  1. Assigning a household's weight as the sum of its tax units' s006, divided by the number of tax units.

  2. Assigning different weights for households for different variables, e.g. for XTOT it would be the sum of XTOT * s006 across its tax units, divided by the total XTOT for the household. I also do this for expanded_income, and it could be done for any summable variable. This is really a generalization of (1), where (1) does this for a count.

Each of these approaches ((1), (2) for XTOT, and (2) for expanded_income) yields similar total household counts of between 51.5M and 52.9M (2017 data). This is ~60% below Census's 2017 estimate of 126M households. Breaking down by both h_seq and ffpos yields ~68M.

Is this discrepancy due to top-coding? Any other ideas on doing this?

Related: did the NYT TCJA calculator analyze by tax unit or somehow aggregate to households? They mention households several times.

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Apr 18, 2018

Upfront, I'm not sure it's valid to reaggregate to the household level. The cps.csv methodology peels off tax units from household data but then ages them and projects them based on assumptions that are purely at the tax unit level. I would worry they've lost their explanatory identity as subparts of a larger household at that point.

All that said, an alternative approach would be to reverse engineer the households only insofar as the underlying tax units share a lowest common weight.

Let's say that you use the CPS to isolate two tax units in the cps.csv that are plucked from the same household. Unit 1 has an s006 weight of 1,000 and Unit 2 has a weight of 500.

Together, these records represent 1,500 tax units. So far, so good.

My interpretation of your approach is that you would convert these two records into a single household with weight 750.

The alternative would be to recognize that the underlying methodology has judged that one of these units, 1, is more common than the other.

So instead, create a household record made up of units 1 & 2 that has a weight of 500 (the minimum weight of all the units in the CPS-matched household). Then create an additional household record made up solely of Unit 1 that captures the residual 500 weight you have left over.

Ultimately then, the number of households you create in s006 space won't be the average s006, it will be the maximum s006 in your CPS-matched household, though you would allocate them across multiple households records if the tax unit s006 values were not all identical.

In this example, you'd be creating 1,000 households across two records, rather than 750 under just one.

@MaxGhenis
Copy link
Contributor Author

@evtedeschi3 Thanks for this idea. It makes sense, but summing the max s006 per household (added to my notebook) also comes up short at 75M total households, vs. Census' 126M estimate. By household-family it's 91M.

I asked the NYT calculator team about their household reporting on Twitter here.

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Apr 21, 2018 via email

@MaxGhenis
Copy link
Contributor Author

MaxGhenis commented Apr 21, 2018 via email

@martinholmer
Copy link
Collaborator

The last comment in issue #1961 was made on April 21, about three and a half months ago.
Are there any reasons to keep issue #1961 open any longer?

@MattHJensen @MaxGhenis @evtedeschi3

@MaxGhenis
Copy link
Contributor Author

It seems like the answer to the question is no, so closing. If anyone thinks of other ideas, please share here as I continue to be interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants