-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extrapolate welfare data #106
Comments
Have been collecting targets & historical data, and will keep updating this spreadsheet. |
When clarifying this extrapolation problem to @hdoupe last Friday, I realized there might be a way to do the extrapolation at 'tax unit' level without participation targets at tax unit level. I thought it through again and haven't seen any loophole yet. So would love to hear how everyone feels about this plan:
Looks feasible? Any thoughts? @MattHJensen @martinholmer @andersonfrailey ps the spreadsheet has been updated to include SSI, SNAP, VB, Social Security, Medicare and Medicaid. |
@Amy-Xu, overall I think this sounds feasible. A couple of questions.
So during the tax-unit creation process, there will be one additional variable for each person in the household containing their participation probability. Would we also be keeping the total amount received in separate variables as well as the one aggregate so we can add/subtract benefits on a individual basis?
Is your idea to have a column in the CPS dedicated to each of the variables for each of the years in the weights file, and then add a function in Tax-Calculator that will specify which one will be used in each year? |
@andersonfrailey asked:
Good question! I haven't thought about this part. Certainly it would be easier to have the individual level dollar amount on the side, as long as the workload is not too much. So for each program, in addition to one aggregate, we'll have an individual level participation probability and dollar amount of benefit received. Depending on how many people one tax unit has, we might have unit size times two number more variables for each program. If that's too much work, I think we can also make it work without the individual dollar amount -- just subtracting an even-splited amount should be fine as well.
That's right. I would prefer those new variables in a separate file though. |
@Amy-Xu said:
Thanks, but I haven't been following this issue at all. Can you step back and explain what the implications of this advance are for the forthcoming |
@martinholmer asked:
The forthcoming cps.csv file will include welfare program data and this new script will create extrapolated participation and benefits for each year all the way to 2026. All the outputs, I imagine, will be saved in a separate file and transferred to the Tax-Calculator once cps.csv is ready. The separate file will work in a similar way as puf_weights.csv, such that in each year the benefits of each tax unit will be replaced with the extrapolated values generated from the new script here. |
@Amy-Xu said in taxdata issue #106:
When I look at this spreadsheet, it contains projections of federal average benefit amounts for SSI. Those don't seem like very good extrapolation targets given that most states supplement the federal SSI benefit. This reference has this to say about state supplementation: And most SSI beneficiaries are disabled. Here is the SSI beneficiary count for 2015 from SSA: Given these very rough extrapolation targets (and the targets for the other programs will be even more speculative), why are you considering such an elaborate extrapolation method? The script developed by @hdoupe is impressive in its logic, but it seems way too ambitious given the weak information available to serve as extrapolation targets. Isn't there a simpler method? Seems like if we have limited and biased information there is little point in processing that information using elaborate algorithms. Is there a less elaborate method that is in better balance with the rough extrapolation targets? Looking at the other benefits variables, I can't even imagine how speculative any SNAP projection would be. The cost of the future program depends on possible legislative changes and the state of the macro economy. Who knows what either of those are going to be like through 2026? |
@martinholmer said:
Right I definitely agree with you that SSI includes both federal and state components. We have not started on a documentation for this routine, but what we applied for SSI in extrapolation is not the federal targets; instead, we applied the federal benefit growth rates to the adjusted federal and state benefits in 2014. In other words, currently we assume state benefit grows at the same pace as federal benefits, which is certainly not a perfect assumption; however it might be the best assumption so far given the scarcity of state level benefit information. Martin also asked:
This is also a question I have been asking myself for a while. Surely other than a few humongous programs like Social Security or Medicare, projection for most welfare programs are highly speculative. Given the rough targets, what are the pros and cons of implementing an elaborate vs simple extrapolation routine? In my mind, pros of an elaborate routine are 1) match with assumed targets better, and 2) if targets are improved in future we can still use the routine without revamp it much so replacement cost is lower; at the same time, cons might be 1) it might take longer to develop than a simpler one, 2) takes longer to run, and 3) possibly more difficult to maintain. But the cons are not completely reality. It took me one day to write the draft for SSI, and one day or so for Hank to improve the algorithm. Originally the script took ~5 min to run and now it takes ~1 min after Hank revamp it. I imagine it needs a few more tweaks to fit the data of all other programs, and the major portion of time would be spent on the CPS tax-unit side making sure the total aggregates right, instead of modifying the scripts. I'm happy to discuss this more. Particularly if you have a simpler routine in mind, I would love to hear more about it. |
@Amy-Xu said:
It's good you think about the pros and cons of the method you're using now, but you forgot to include the con having to do with all the extra work that would be required in Tax-Calculator. I admit I might not completely understand all your goals, but I would like to suggest a simpler approach that would reduce work in C-TAM and taxdata and reduce work in Tax-Calculator. You've done an good job getting CPS dollar benefits amounts to add up to administrative totals for 2014. And you have aggregate benefit totals you would like to come close to in years after 2014. For YEAR (ranging from 2015 through 2026) tabulate the weighted benefit total for YEAR using the 2014 benefit amounts and the weights for YEAR. Call this total R, for raw. Let the administrative target for this benefit in YEAR be called T, for target. The the extrapolation factor for YEAR for that benefit is F = T / R. Each of your non-OASDI benefit variables can have their own personalized factors and we can add the annual values of those factors into an expanded Let's talk about the pros and cons of this simpler approach. |
I fixed a mistake in my earlier comment on taxdata issue #106 by adding to that comment the following sentence: Actually, what goes into the |
Martin @martinholmer proposed a simpler routine for welfare extrapolation in the comment above. If I understand it right, I can see two big pros for this simpler routine. First, as Martin mentioned, it doesn't need as much work prior to TC stage. Potentially if any users have their own targets, they could replace the factors easily in TC, without turning to taxdata or C-TAM. Second, in TC, this routine doesn't need significant extra space to store factors, while the elaborate one would add a chunk of data for benefits of each year. My biggest concern is about participation. It seems this simpler method would peg the participation growth rate to tax unit growth rate, while I have always assumed total number of participation is quite important for C-TAM, and presumably for extrapolation as well. But I have never confirmed it with anyone. Would love to hear input from @MattHJensen regarding this issue. Regarding workload for adding this extrapolation to TC, I don't see much difference programming wise (may be I'm not knowledgable enough on the lastest TC). Since this is a part of deploying cps.csv to TC, we have to add facilities in TC to read in a separate cps_weight.csv, cps_ratio.csv, I assume. It doesn't seem to me, the simpler routine would be significantly superior to the elaborate one as each should just need a few extra lines of code. |
@Amy-Xu said:
My concern is not so much about the size of the extra "chunk of data", but the extra code that reads that extra data and then applies it only when using CPS input data. @Amy-Xu continued:
Most of the results that come out of Tax-Calculator are dollar amounts. If you really want accurate beneficiary head counts, then you could always use Tax-Calculator to conduct a simulation for 2014. I don't see how you can expect Tax-Calculator extrapolation to work differently for a handful of new benefit variables. It may be simplistic, but it is absolutely standard operating procedure in the tax simulation world, to extrapolate in the manner we already extrapolate social security benefits, In fact your complex method is likely to lead to unrealistic longitudinal results as you change filing unit participation from year to year in an ad hoc manner. In order to change program participation from year to year in an empirically plausible manner you would have had to use longitudinal data to estimate transition probabilities on and off each benefit program. Nobody is saying you should have done that. My point is that your method, which involves changing program participation without any guidance from longitudinal data, is very likely to introduce unrealistic patterns of program participation for a filing unit over the years after 2014. If you want super-accurate beneficiary head counts after 2014, then you would have to do this outside of Tax-Calculator. But I think the notion that you are going to get super-accurate beneficiary counts in years after 2014 is a near fantasy given the subjectiveness of your extrapolation targets. Again, the subjectiveness of your extrapolation targets is not your fault. It is inherent in trying to forecast over a decade into the future what these programs are going to look like. @Amy-Xu continued:
Well you've been away from Tax-Calculator for a long time, so you are forgetting that the names of those files are just arguments of the Records class constructor function. So, in fact, there is no extra work reading those files. It is your proposal that creates a different kind of file --- one that is not used when reading the PUF related input files --- that creates all the extra work. @Amy-Xu concluded:
As you can see from my earlier comments, I beg to differ with that conclusion. |
@martinholmer said:
A UBI reform that involved removing all major welfare programs is actually not a common tax reform. In fact, connecting the welfare world to the tax analysis is rarely done, let alone welfare extrapolation, and is very difficult since few people know how the tax unit welfare distribution looks like given a individual or household welfare distribution. As you probably have seen in this working paper we released earlier this year, we not only care about the count of tax units in each income class, but also average number of people in each tax unit. I have already acknowledged that the targets are not perfect; however, the progressive direction, in my opinion, is to see how we could improve those targets. Simply giving up participation targets may leave us stranded when the tax unit number or individual distribution looks non sensible -- we could have done a better job but didn't give it a go. I don't think it's the best to go backward under the argument that the targets are flawed. @martinholmer also said:
I'm aware that, and feel very convenient to plug in a new dataset without modifying any code. However, it seems to me CPS.csv is going to be one of the default options on TC, and eventually to be one of the default options on TaxBrain. I imagine we have to specify whether the input is PUF or CPS, which would require uploading cps_weights.csv and cps_ratios.csv to TC, and would require extra variables to label whether the input is PUF or CPS. These extra code would make integration to TB easier. Of course my specialty is not webapp development and this might not be necessary. I would love to hear more thoughts on this. @MattHJensen If the upcoming change to TC for UBI simulation seems absolutely unacceptable to you, I offer to execute the welfare extrapolation outside TC in a notebook, which I think is feasible, as long as it doesn't block the web application development in later stage. |
@Amy-Xu said:
@Amy-Xu, do most programs, other than SSI, also have official projections for participation? |
Also, @Amy-Xu, have you tried running tax-calculator with cps.csv and the weights file produced by your and Hank's work? |
@MattHJensen asked:
Social Security, Medicare, Medicaid do, but SNAP and VB don't.
Not yet, but it isn't a weights file, it is an extrapolated benefit file that potentially would work in a similar way as the weights file -- replace the benefit column with a future year extrapolated benefit. To make it work with TC, we will need to add a few lines of code. |
It seems there's a consensus on how to proceed on this extrapolation routine issue per discussion in PR #1500 in TC. Closing. |
The third task outlined in this issue is to develop an extrapolation routine for welfare data in CPS tax unit dataset. An initial thought is to assume for each program, participation and benefit grow at respectively X and Y percent each year, where X and Y are derived from historical data. (If official projection targets are available, then we could use those targets directly.) Then could use the same logit regression for imputation to meet the targets for participation growth and then apply an uniform ratio to everyone in order to blow up total benefit.
Many details need to be considered, but for now the most tricky part is whether to do this extrapolation on tax unit or original program benefit unit (individual/household) in raw CPS. Individual or household level is natural since all projection or historical data would be available at these level; however, this will create enormous difficulty afterwards because raw CPS needs to go through tax-unit creation process. The weights of records do not stay the same over time and thus extrapolation based on 2014 raw CPS weight cannot guarantee hitting the targets in later years. On the other hand, extrapolating the data at tax-unit level would make later steps easier, but there isn't any targets or historical welfare data at tax-unit level.
Other things to consider:
Any thoughts? @MattHJensen @martinholmer @andersonfrailey @hdoupe
The text was updated successfully, but these errors were encountered: