Is it valid to aggregate taxcalc data to households? #1961

MaxGhenis · 2018-04-08T18:40:21Z

I'm looking to aggregate taxcalc data to the household level using h_seq, and am having trouble thinking through what the appropriate weight would be for each household record. It seems to me that the variance in s006 across tax units within a household means that there's no single weight that can be assigned to households to produce the same aggregates as tax unit totals. Instead the ideal way might be linear programming to match tax-unit-level totals, or administrative totals directly.

In this notebook I tried two other simple approaches that I'd expect would produce similar ballpark weights:

Assigning a household's weight as the sum of its tax units' s006, divided by the number of tax units.
Assigning different weights for households for different variables, e.g. for XTOT it would be the sum of XTOT * s006 across its tax units, divided by the total XTOT for the household. I also do this for expanded_income, and it could be done for any summable variable. This is really a generalization of (1), where (1) does this for a count.

Each of these approaches ((1), (2) for XTOT, and (2) for expanded_income) yields similar total household counts of between 51.5M and 52.9M (2017 data). This is ~60% below Census's 2017 estimate of 126M households. Breaking down by both h_seq and ffpos yields ~68M.

Is this discrepancy due to top-coding? Any other ideas on doing this?

Related: did the NYT TCJA calculator analyze by tax unit or somehow aggregate to households? They mention households several times.

The text was updated successfully, but these errors were encountered:

ernietedeschi · 2018-04-18T15:11:48Z

Upfront, I'm not sure it's valid to reaggregate to the household level. The cps.csv methodology peels off tax units from household data but then ages them and projects them based on assumptions that are purely at the tax unit level. I would worry they've lost their explanatory identity as subparts of a larger household at that point.

All that said, an alternative approach would be to reverse engineer the households only insofar as the underlying tax units share a lowest common weight.

Let's say that you use the CPS to isolate two tax units in the cps.csv that are plucked from the same household. Unit 1 has an s006 weight of 1,000 and Unit 2 has a weight of 500.

Together, these records represent 1,500 tax units. So far, so good.

My interpretation of your approach is that you would convert these two records into a single household with weight 750.

The alternative would be to recognize that the underlying methodology has judged that one of these units, 1, is more common than the other.

So instead, create a household record made up of units 1 & 2 that has a weight of 500 (the minimum weight of all the units in the CPS-matched household). Then create an additional household record made up solely of Unit 1 that captures the residual 500 weight you have left over.

Ultimately then, the number of households you create in s006 space won't be the average s006, it will be the maximum s006 in your CPS-matched household, though you would allocate them across multiple households records if the tax unit s006 values were not all identical.

In this example, you'd be creating 1,000 households across two records, rather than 750 under just one.

MaxGhenis · 2018-04-21T18:28:02Z

@evtedeschi3 Thanks for this idea. It makes sense, but summing the max s006 per household (added to my notebook) also comes up short at 75M total households, vs. Census' 126M estimate. By household-family it's 91M.

I asked the NYT calculator team about their household reporting on Twitter here.

ernietedeschi · 2018-04-21T18:44:23Z

Ah, I figured it would generate more. We’ll see what the NYT folks say, but I’ll bet that “household” in their piece was just a journalistic term of art and they really performed the analysis at the tax unit level.

MaxGhenis · 2018-04-21T20:34:11Z

Yes, Ben Casselman confirmed that it was actually at the tax unit level.

…

On Sat, Apr 21, 2018, 11:44 AM evtedeschi3 ***@***.***> wrote: Ah, I figured it would generate more. We’ll see what the NYT folks say, but I’ll bet that “household” in their piece was just a journalistic term of art and they really performed the analysis at the tax unit level. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1961 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFy2zyp_E1JsdPVqNO04lrA2kzoJahjqks5tq34IgaJpZM4TLpfj> .

martinholmer · 2018-08-06T14:03:18Z

The last comment in issue #1961 was made on April 21, about three and a half months ago.
Are there any reasons to keep issue #1961 open any longer?

@MattHJensen @MaxGhenis @evtedeschi3

MaxGhenis · 2018-08-06T14:56:00Z

It seems like the answer to the question is no, so closing. If anyone thinks of other ideas, please share here as I continue to be interested.

martinholmer added the question label Jun 10, 2018

MaxGhenis closed this as completed Aug 6, 2018

MaxGhenis mentioned this issue Jul 10, 2019

Weight and age CPS tax units in the same household together PSLmodels/taxdata#323

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it valid to aggregate taxcalc data to households? #1961

Is it valid to aggregate taxcalc data to households? #1961

MaxGhenis commented Apr 8, 2018

ernietedeschi commented Apr 18, 2018 •

edited

Loading

MaxGhenis commented Apr 21, 2018

ernietedeschi commented Apr 21, 2018 via email

MaxGhenis commented Apr 21, 2018 via email

martinholmer commented Aug 6, 2018

MaxGhenis commented Aug 6, 2018

Is it valid to aggregate taxcalc data to households? #1961

Is it valid to aggregate taxcalc data to households? #1961

Comments

MaxGhenis commented Apr 8, 2018

ernietedeschi commented Apr 18, 2018 • edited Loading

MaxGhenis commented Apr 21, 2018

ernietedeschi commented Apr 21, 2018 via email

MaxGhenis commented Apr 21, 2018 via email

martinholmer commented Aug 6, 2018

MaxGhenis commented Aug 6, 2018

ernietedeschi commented Apr 18, 2018 •

edited

Loading