Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filing units with high values of percentage change in (1-MTR) under TCJA #1852

Closed
GoFroggyRun opened this issue Jan 30, 2018 · 29 comments
Closed

Comments

@GoFroggyRun
Copy link
Contributor

GoFroggyRun commented Jan 30, 2018

This issue is related to #1827. In #1827, @MattHJensen pointed out that the partial equilibrium results here look odd. I have similar feeling in that, from year 2020 to 2022, behavioral response (substitution effect to be specific) almost doubled the combined lability.

To sort out the issue, I created a notebook here. This notebook compares result from different tax-calculator releases, namely 0.14.2 and 0.15.1.

For trail with 0.14.2 version, I tried to re-create the same situation as described in #1827, where I used the TCJA_Reconciliation.json reform (that comes with 0.14.2 release) and implemented behavior assumption "_BE_sub": {"2018": [0.25]}. The result in cell [8] matches the static results and dynamic results closely for results in 2020.

For trail with 0.15.1 version, I tried the same thing in a different approach: by implementing the reform 2017_law.json to base calculator and implementing behavior assumption "_BE_sub": {"2018": [0.25]} to the reform calculator. Cell [7] shows the results for this trail.

Comparing the two trails:

  1. the baselines are exactly the same
  2. the static reform results are very close (the difference might come from some modifications toward TCJA_Reconciliation.json in release 0.14.2, which, after being modified, becomes the current law in 0.15.1 release)
  3. for dynamic part, the latter result almost tripled its counterpart in previous trail

It seems to me that nothing regarding behavioral logic has changed since 0.14.2. But why are the dynamic results so different while the static results are almost identical? Did I miss something?

cc @MattHJensen @martinholmer @Amy-Xu

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Jan 31, 2018

I updated the notebook here by adding another example, where I still worked with tax calculator version 0.14.2. My primary rationale for using 0.14.2 is that buggy logic, if any, would be clearer if we don't have to change the baseline (in 0.15.1 both the baseline and reform have to be changed in order to replicate #1827).

The latest example partitioned the TCJA_Reconciliation.json(a version that comes with TC 0.14.2 release) into two parts, where the (most likely) buggy part being:

{
    "policy": {
        "_PT_exclusion_rt":
            {"2018": [0.2],
             "2026": [0.0]},
        "_PT_exclusion_wage_limit":
            {"2018": [0.5],
             "2026": [9e+99]}
    }
}

and the rest provisions in the other reform file.

For both calculators, I implemented "_BE_sub": {"2018": [0.25]} and got the following combined liability result for year 2020:

('Static Base:', 3154.0587592757793)
('Static Reform (Normal):', 2990.1350445729731)
('Dynamic Reform (Normal):', 3245.6943394126324)
('Static Reform (Buggy):', 3127.8674546434095)
('Dynamic Reform (Buggy):', 5480.1338552578964)

The buggy reform, although has minor static impact, seems to result in huge combined liability change in presence of substitution effect assumption. In other words, the behavior assumptions seem to be very responsive when _PT_exclusion_rt and _PT_exclusion_wage_limit are included in the reform.

Since we don't have a strict benchmark when behavior assumptions are included, it's hard for me to solely determine whether the results make sense or not. In particular, I'm wondering:

  1. Does the 'Dynamic Reform (Normal)' result seem sensible?
  2. Does the 'Dynamic Reform (Buggy)' result seem sensible?

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Jan 31, 2018

I think I got some clues that might could explain the situation in #1827.

Apparently, marginal tax rates matter while calculating behavior responses. Surprisingly (at least to me), the choices of nearone parameter should affect the tax liability significantly in presence of behavior assumptions.

On current master of tax-calculator, nearone is set to nearone = 0.999999. For testing purpose, on my local copy of Tax-calculator 0.15.2 release, I tweak the behavior function a bit and make nearone undetermined. And did the following calculations: (more details can be found here)

screen shot 2018-01-31 at 5 37 45 pm

where we observe that less digits of 9s would weaken the effect of behavior response, while more 9s would amplify its effect, which is expected. What seems unexpected to me is that the impact of nearone is real huge (when mtr1 takes value nearone, the pch can be huge). I couldn't find much background info regarding why 0.999999 is our choice of nearone, and I'm wondering why aren't we using 0.9999999999 or 0.999?

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Jan 31, 2018

Moreover, given the fact how nearone could affect the result, it can be inferred that there are considerable amount of cases where wage_mtr1 > nearone such that mtr1 = nearone, which would make pch very large. I'm looking into the mtr calculations for more details.

@GoFroggyRun GoFroggyRun changed the title Inconsistent behavior response? Units with High (above 1) Marginal Tax Rates Jan 31, 2018
@martinholmer
Copy link
Collaborator

In issue #1852, @GoFroggyRun said:

I think I got some clues that might could explain the situation in #1827.

Apparently, marginal tax rates matter while calculating behavior responses. Surprisingly (at least to me), the choices of nearone parameter should affect the tax liability significantly in presence of behavior assumptions.

and Sean also said:

Moreover, given the fact how nearone could affect the result, it can be inferred that there are considerable amount of cases where wage_mtr1 > nearone such that mtr1 = nearone, which would make pch very large. I'm looking into the mtr calculations for more details.

Sean, you're on the right track. Keep going.

@martinholmer
Copy link
Collaborator

In issue #1852, @GoFroggyRun said:

Moreover, given the fact [that] nearone could affect the result, it can be inferred that there are [a] considerable [number] of cases where wage_mtr1 > nearone [implying] that mtr1 = nearone, which would make pch very large. I'm looking into the mtr calculations for more details.

Sean, you seem to be on the right track here.

But look back at the history on the issue of unexpectedly large behavioral responses to TCJA, which was first raised by @rickecon and @jdebacker in issue #1668 last November. If you read all the comments on #1688 you'll see that there were only a few filing units with wage_mtr1 > nearone and handling them differently in pull request #1698 did not change the large behavioral response to TCJA as it was characterized in November.

For those of you following this issue, look at this code fragment from the static Behavior method response:

            # calculate magnitude of substitution effect
            if calc2.behavior('BE_sub') == 0.0:
                sub = np.zeros(calc1.array_len)
            else:
                # proportional change in marginal net-of-tax rates on earnings
                nearone = 0.999999
                mtr1 = np.where(wage_mtr1 > nearone, nearone, wage_mtr1)
                mtr2 = np.where(wage_mtr2 > nearone, nearone, wage_mtr2)
                pch = ((1. - mtr2) / (1. - mtr1)) - 1.

This representation of behavioral response allows very large values of pch even for filing units whose marginal tax rates are less than one. In fact, the nearone logic sets pch to zero for the few filing units with marginal tax rates greater than one.

The real issue is the number of filing units whose marginal tax rates (both before and after TCJA) are less than one, but mtr2 is much lower than mtr1 implying a large value for pch. I described this phenomenon in this November 24th comment, which included this:

So, the TCJA_Senate.json reform leaves the MTR on taxpayer earnings unchanged (that is, pch is essentially zero) for 72,597 filing units and raises the MTR (that is, pch is negative) for 14,484 filing units. The remaining 128,444 filing units experience a reduction in MTR (that is, pch is positive). And some of those experience very large MTR reductions: 28 filing units see their MTR drop so much that pch is greater than one (that is, the after-tax marginal rate doubles or more) and another 609 see their MTR drop so that pch is between 0.5 and 1.0 (where 1.0 is doubling).

Given that those with large positive pch values are typically high-income, the 0.4 substitution elasticity generates large increases in taxable income. At the extreme end of the sub distribution, we have 1,316 filing units who are simulated to have an increase in taxable income of one million dollars or more. And there are another 9,941 filing units who are simulated to have an increase in taxable income of between $100,000 and $1,000,000.

It seems to me the questions @rickecon and @jdebacker posed are: "Are these results reasonable? And, if not, what is the matter with Tax-Calculator?" If you think these results are unreasonable, you need to demonstrate one or more of the following things:

  • The changes in MTR under the TCJA_Senate.json are incorrect because either that reform is not specified quite right or the code that implements that reform is not quite correct.
  • The logic of the Behavior.response function is flawed.

It seem to me that it is still one of the above two. Right now you thinking is focused on the choice of value for nearone and you may be correct in that 0.999999 is not the best value to assign nearone.

But don't forget your finding in #1852 that

the (most likely) buggy part [is]:

{
   "policy": {
       "_PT_exclusion_rt":
           {"2018": [0.2],
            "2026": [0.0]},
       "_PT_exclusion_wage_limit":
           {"2018": [0.5],
            "2026": [9e+99]}
   }
}

Your finding here is very important and is probably related to issue #1816 and its resolution in pull request #1819, which was first included in Tax-Calculator 0.15.0.

Keep up the good work and solve this behavioral response puzzle.

@martinholmer martinholmer changed the title Units with High (above 1) Marginal Tax Rates Filing units with high values of percentage change in (1-MTR) under TCJA Feb 1, 2018
@feenberg
Copy link
Contributor

feenberg commented Feb 1, 2018 via email

@feenberg
Copy link
Contributor

feenberg commented Feb 1, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 1, 2018

@martinholmer thanks for your thoughtful remarks and thorough background information.

I agree with you that, based on my current findings in #1852, the behavioral response puzzle is most likely due to MTRs or _PT related reforms, or both. I started with investigating the MTR part. To be specific, I modify the nearone logic so that we can compare before and after effect of #1698 (i.e. the nearone logic).

The MTR calculation now looks like:

                if use_nearone:
                    nearone = 0.999999
                    mtr1 = np.where(wage_mtr1 > nearone, nearone, wage_mtr1)
                    mtr2 = np.where(wage_mtr2 > nearone, nearone, wage_mtr2)
                else:
                    mtr1 = wage_mtr1
                    mtr2 = wage_mtr2
                pch = ((1. - mtr2) / (1. - mtr1)) - 1.

where use_nearone is a flag to determine whether nearone should be used or not.

And I got the following result when comparing 2017_law.json and TCJA_Reconciliation.json:

screen shot 2018-02-01 at 12 35 47 pm

The table indicates total liability differences for TCJA reform against 2017 law in presence of substitution assumption (BE_sub = 0.25), where the Reg column contains results that do not use nearone approximation while the other one does. The notebook used can be found here.

Looking at the table, nearone logic seems to affect the result significantly. My guess is that capping high MTRs at 0.999999 might cause pch to become very large, and thus lead to results that are unreasonably sensitive to substitution effect.

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 1, 2018

Using similar toy examples in #1668, assume the following case:

MTR_base MTR_ref _BE_sub Income change in income
1.1 0.2 0.2 100 -180
0.999999 0.2 0.2 100 +15,999,980

where the calculation is based on

change in income = [(1 - MTR_ref) / (1 - MTR_base) - 1] x _BE_sub x Income.

This example might help explain what's happening when nearone is used.

I mentioned earlier that the problem might due to the reform file used. To see whether this is the case, I introduced similar logic (use_nearone) for TC version 0.14.2 and re-produced my earlier results:

screen shot 2018-02-01 at 2 35 34 pm

When nearone logic is not used, even the results from "buggy" reform now seem sensible to me. In fact, the "buggy" reform

{
   "policy": {
       "_PT_exclusion_rt":
           {"2018": [0.2],
            "2026": [0.0]},
       "_PT_exclusion_wage_limit":
           {"2018": [0.5],
            "2026": [9e+99]}
   }
}

would cause more units to have MTRs (mtr_base/mtr_1) raised above one. And when it happens, those units could become too responsive, as shown in the toy example.

To sum up, the "buggy" reform hikes the MTRs so that more units' MTRs are capped at 0.999999, which could lead to unreasonably responsive behavior effects. And, as a result, the root cause of the phenomenon is due to nearone logic. In particular, 0.999999 might not be a good choice to assign nearone.

@jdebacker
Copy link
Member

jdebacker commented Feb 1, 2018

@GoFroggyRun Nice work digging into this.

One point - it still seems to me that the MTR values should be capped at some value < 1. As your toy example shows, if the MTR can exceed 1, then you can run into cases where the MTR values, but the change in income is negative (even with a positive _BE_sub effect).

@feenberg
Copy link
Contributor

feenberg commented Feb 1, 2018 via email

@feenberg
Copy link
Contributor

feenberg commented Feb 1, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

@jdebacker If we'd rather to have MTRs capped, then my comment here suggests that other (less precise) choices of nearone might work (i.e. 0.999): it would cap the MTRs below one, and would avoid cases similar to my toy example. Does such choice of nearone make sense to you?

@GoFroggyRun
Copy link
Contributor Author

@feenberg said:

Typically this formula would be:

  change in income = [(MTR_ref-MTR_base) / (1 - MTR_base) - 1] x elasticity x Income.

Is _BE_sub an elasticity? I think the two formulas are similar for small
changes in small MTRs but I am not sure why the more traditional formula
isn't used. . nearone doesn't appear in either formula - why does
it affect the result?
  1. Yes. _BE_sub is an elasticity.

  2. I'm not sure either about why we are not using the other formula.

  3. Regarding nearone, the full logic when calculating changes in income in current tax-calculator is something like:

nearone = 0.999999
if wage_mtr1 > nearone then mtr1 = wage_mtr1 else mtr1 = nearone
if wage_mtr2 > nearone then mtr2 = wage_mtr2 else mtr2 = nearone
pch = ((1. - mtr2) / (1. - mtr1)) - 1.
change in income = pch x elasticity x income.

where wage_mtr1 and wage_mtr2 are "raw" MTRs.

Does the logic make sense to you?

@jdebacker
Copy link
Member

@GoFroggyRun rounding to something below, but further from 1 makes sense.

But @feenberg makes a very good suggestion w.r.t. trying negative and positive finite differences and comparing the two. calculate.py has a flag to try negative finite differences, so this would only be a small change in TC logic.

@jdebacker
Copy link
Member

@feenberg The partial equilibrium simulation on TaxBrain cites Creedy and the formula used in Tax-Calc for the change in income appears to come from Equation (1) there.

@feenberg
Copy link
Contributor

feenberg commented Feb 1, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 2, 2018

I tried @feenberg 's suggestion

Actually, any implausible MTR, such as >.7 should probably be ignored.

by adding the following logic to tax-calculator behavior.py:

mtr1 = wage_mtr1
mtr2 = wage_mtr2
pch = ((1. - mtr2) / (1. - mtr1)) - 1.
if drop_mtr:
    mtr_cap = 0.7
    pch = np.where(wage_mtr1 > mtr_cap, 0.0, pch)
    pch = np.where(wage_mtr2 > mtr_cap, 0.0, pch)

such that, whenever implausible MTR occurs, the pch is set to zero (i.e. change in income is zero). I got the following combined lability results when comparing 2017 tax law against TCJA reconciliation reform, under the behavior assumption "_BE_sub": {"2018": [0.25] (baseline is 2017 law).

Year Baseline ($B) Static Reform ($B) Drop High MTR, Dynamic ($B) Regular (No nearone logic), Dynamic ($B)
2018 2937.654024 2751.341975 2814.440428 2814.954574
2019 3045.538637 2858.180907 2922.470867 2922.706948
2020 3154.058759 2965.196661 3030.732779 3030.891285
2021 3273.926547 3083.281800 3150.574925 3150.750985
2022 3409.445729 3214.712215 3284.244253 3284.429588
2023 3556.676833 3358.180660 3430.294160 3430.548037
2024 3711.096056 3509.007804 3583.895902 3584.170679
2025 3872.563762 3665.658179 3743.266276 3743.563506
2026 4042.536042 4069.803691 4067.557936 4067.571816
2027 4201.887042 4233.488832 4230.809872 4230.942892

It seems to me that ignoring behavioral response from high MTR filers has very minor impact. @feenberg does the difference (between drop vs not drop) make sense to you?

Does such treatment make sense to you? @jdebacker @MattHJensen

@jdebacker
Copy link
Member

@GoFroggyRun These results look good to me. I'd probably favor the cap at 0.7 since it would catch any of the (rare) instances of computing the finite difference at a notch. It also eliminates the issue that arises from MTRs> 1 in the denominator of the pch calculation.

Your results also suggest Issue #1668 is solved, right? These dynamic effects seem quite reasonable.

@feenberg
Copy link
Contributor

feenberg commented Feb 2, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

@jdebacker I'm not sure whether such treatment will resolve #1668 or not. In fact, I don't think it would. The issue in #1668 is that "Tax-Calculator results are too sensitive to substitution effect elasticity", while the intention of my comparison is to show that the before and after effect of this treatment doesn't change our current behavior responses much. If my results do explain #1668, then that means we have considerable numbers of filers with high MTRs that would affect aggregate liability, which doesn't seem to be the case.

@GoFroggyRun
Copy link
Contributor Author

@feenberg said:

Anyway, it looks like 2018 income is increased by about 3%. Is this
reasonable? It seems high to me for an elasticity of .25.

Dan, the numbers to look at in 2018 are 2814.440428 (drop high MTRs) and 2814.954574 (do not drop), so the percentage change here is less than 0.02%. 2937.654024 is the liability for baseline (i.e. without reform nor behavior), which I should have not included.

@jdebacker
Copy link
Member

@GoFroggyRun You posted aggregate results above. My comment was just referencing the fact those those aggregate responses with behavior seem much more reasonable than what was noted in #1668.

You should check this, but I think the behavioral responses you find above are quite reasonable. I was finding a drop in MTRs (average over all taxpayers with positive TI, weighted by TI) on the order of 3 to 4 percentage points from the TCJA. This comes to a percentage change of about 10-12%. With an elasticity of 0.25, a 3% change in income is about right.

@GoFroggyRun
Copy link
Contributor Author

@jdebacker said:

My comment was just referencing the fact those those aggregate responses with behavior seem much more reasonable than what was noted in #1668.

Ahh. Yup, I definitely agree.

You should check this, but I think the behavioral responses you find above are quite reasonable. I was finding a drop in MTRs (average over all taxpayers with positive TI, weighted by TI) on the order of 3 to 4 percentage points from the TCJA. This comes to a percentage change of about 10-12%. With an elasticity of 0.25, a 3% change in income is about right.

Thanks for your explanation regarding change in income. I'll check how such treatment would affect average MTRs and change in average MTRs.

@feenberg
Copy link
Contributor

feenberg commented Feb 2, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 5, 2018

@jdebacker I double checked the averaged MTRs, where, for positive taxable income group, the weighted average MTR for baseline (pre-TCJA) is 0.329, and 0.299 for TCJA. It seems that the numbers are very close to what you've mentioned. So the effect of TCJA in income is about 2.3% with an elasticity of 0.25.

@MattHJensen
Copy link
Contributor

Over at #1856, @martinholmer said:

Whatever the problems in the Behavior.response method (and, in my view, there are more than what we've been discussing since issue #1668 was raised in November), I don't see this as a sensible solution. We are allowing Tax-Calculator users to set tax rates as high a 1.0 in each taxable-income bracket, so it would be quite likely that a user who, say, wanted to study the behavioral effects of a move from Eisenhower-era income tax rates to present-day tax rates, would have a a baseline policy in which a relatively large number of filing units had a MTR in excess of 0.70. And it would be among those filing units that the user would expect to see the largest behavioral responses. So, this approach is going to generate results that make no sense to that user, and that user would likely file a Tax-Calculator bug report. And that user would be right, in my view.

I agree that capping MTRs at 0.7 is unsatisfactory for the same reasons Martin describes. Capping at 0.99 would be better.

@martinholmer, do you think that the other problems that you see in the behavior.response method might be associated with the problem we are facing here?

@martinholmer
Copy link
Collaborator

@MattHJensen asked in issue #1852:

do you think that the other problems that you see in the behavior.response method might be associated with the problem we are facing here?

The main "other problems" I see with Behavior.response logic are not central to the discussion in #1668, #1827 and #1852.

However, I've included my ideas about how to handle the #1668, #1827, #1852 problems in pull request #1858.

@martinholmer
Copy link
Collaborator

Issue #1852 has been resolve by the merge of pull request #1858.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants