compatible_data field(s) in current_law_policy.json #1480

MattHJensen · 2017-07-14T15:42:11Z

Now that the cps.csv file is much further along, I am opening an issue to renew discussion of adding a compatible_data field in current_law_policy.json.

In #1074 I suggested:

Add a "compatible_data" attribute to each parameter after the "notes" field.

The acceptable values that I anticipate in the short to medium term are:
- "taxdata_puf", which would refer to the PUF based datafile produced by TaxData.
- "taxdata_cps", which would refer to the CPS based datafile produced by TaxData.
- "taxcalc_filings", which would refer to the Luca-based taxpayer inputted data that @zrisher is developing.
- any comma-separated combination of those three, such as "taxdata_puf, taxdata_cps, taxcalc_filings".

We need this attribute because not every parameter will be relevant to every data source. For some parameters, all of the relevant variables from the data source might be zero. Implementing reforms to these parameters will not influence the results. Given that our users implement reforms to many provisions at once, many will not notice that there is a problem. I expect this will lead to quite a few silly mistakes and some resentment. >

Based on a conversation with @martinholmer, I now believe that we would want to make an explicit rule about what "compatible" means, as there will be gray areas. One possible rule would be, "does this parameter affect results with this given dataset". This is something that we could even test relatively simply, albeit with significant computational expense (as @zrisher notes below).

Here is the rest of @zrisher's response to this proposal from #1074.

It will be a bit of extra maintenance work. When changing the way policy is coded, a contributor must remember to update this field if usable data changes, and the relationships may not be easy to trace. These fields could also be affected by changes to any of the listed data sets. It's also computationally expensive to enforce via automated testing. However, the value prop you described is definitely there.

MattHJensen · 2017-07-14T19:22:49Z

I am happy to take a first stab in this direction if others are agreeable to the idea.

martinholmer · 2017-07-14T19:40:34Z

@MattHJensen said about issue #1480:

I am happy to take a first stab in this direction if others are agreeable to the idea.

Don't you think it makes sense to begin this work only after we have settled on the contents of the new CPS file?

MattHJensen · 2017-07-14T19:50:35Z

I am thinking about opening a PR to demonstrate how this would look applied to puf.csv alone. It would consist of a test and simple changes to current_law_policy.json.

Likely I won't get to it until I return on Aug 18.

martinholmer · 2017-07-14T19:53:15Z

@MattHJensen said:

I am thinking about opening a PR to demonstrate how this would look applied to puf.csv alone. It would consist of a test and simple changes to current_law_policy.json.

Likely I won't get to it until I return on Aug 18.

OK, that makes sense. And who knows, maybe we'll have made some progress on the CPS file by the time you return to the office.

feenberg · 2017-07-14T22:26:19Z

On Fri, 14 Jul 2017, Matt Jensen wrote: Now that the cps.csv file is much further along, I am opening an issue to renew discussion of adding a compatible_data field in current_law_policy.json. In #1074 I suggested: (a) Add a "compatible_data" attribute to each parameter after the "notes" field. - The acceptable values that I anticipate in the short to medium term are: - "taxdata_puf", which refers to the PUF based datafile produced by TaxData. - "taxdata_cps", which refers to the CPS based datafile produced by TaxData. - "taxcalc_filings", which refers to the Luca-based taxpayer inputted data that @zrisher is developing. - any comma-separated combination of those three, such as "taxdata_puf, taxdata_cps, taxcalc_filings". o The reason why we need this attribute is that every parameter will not be relevant to every dataset because all of the relevant variables in a dataset might be zero. The calculator will still run just fine with the dataset and even with parameter modifications, but the parameter modifications won't affect results, and I expect this to be quite confusing to users. o The TaxBrain rule would be to display all of the parameters that are compatible with the chosen dataset (currently taxdata_puf is the only option). Based on a conversation with @martinholmer, I now believe that we would want to make an explicit rule about what "compatible" means, as there will be gray areas. One possible rule would be, "does this parameter affect results with this given dataset". This is something that we could even test relatively simply, albeit with significant computational expense (as @zrisher also points out with his comment below), so we wouldn't want it to be a test that is run frequently. Here is the rest of @zrisher's response to this proposal from #1074. It will be a bit of extra maintenance work. When changing the way policy is coded, a contributor must remember to update this field if usable data changes, and the relationships may not be easy to trace. These fields could also be affected by changes to any of the listed data sets. It's also computationally expensive to enforce via automated testing. However, the value prop you described is definitely there.

I agree that the benefits are significant, but that doing this manually introduces a likely source of error. Done mechanically, does it need to be such a great computational expense? For each parameter we would need to calculate revenue with the parameter changed from the default to the maximum and minimum permitted, stopping at the first record showing a change in liability. The small sample would be sufficient. The rest of the model - dropq, formatting tables, etc need not be invoked. That is going to be less than the cost of a single run of the model on the full sample. BTW, rather than omit the parameters that have no effect, why not gray them out, so that the user would see an indication that by switching datasets he could work with that parameter. dan

martinholmer · 2017-07-18T20:04:38Z

The ideas described in issue #1480 have been favorably received, so I'm changing the label from Question to Enhancement. I'm also taking @MattHJensen at his word and assigning the enhancement task to him.

martinholmer · 2017-07-19T17:22:03Z

@martinholmer said in issue #1480:

To be more consistent with the other JSON fields in the current_law_policy.json file, the value of the "compatible_data" key should probably be a list (not a comma separated string).

After experience adding an "availability" field to the records_variables.json file, I think this was a poor suggestion. So, I'm going to remove all traces of my poorly thought-out suggestion.

@MattHJensen

martinholmer · 2017-12-05T22:01:20Z

Issue #1480 is being resolved by pending pull request #1614.

MattHJensen added the question label Jul 14, 2017

martinholmer added enhancement and removed question labels Jul 18, 2017

martinholmer assigned MattHJensen Jul 18, 2017

MattHJensen mentioned this issue Aug 21, 2017

Adding UBI Functionality ospc-org/ospc.org#614

Closed

MattHJensen mentioned this issue Nov 2, 2017

Compatible data fields #1614

Merged

MattHJensen added the in progress label Nov 2, 2017

martinholmer closed this as completed in #1614 Jan 13, 2018

martinholmer removed the WIP label Jan 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatible_data field(s) in current_law_policy.json #1480

compatible_data field(s) in current_law_policy.json #1480

MattHJensen commented Jul 14, 2017 •

edited

Loading

MattHJensen commented Jul 14, 2017

martinholmer commented Jul 14, 2017

MattHJensen commented Jul 14, 2017 •

edited

Loading

martinholmer commented Jul 14, 2017

feenberg commented Jul 14, 2017 via email

martinholmer commented Jul 18, 2017

martinholmer commented Jul 19, 2017

martinholmer commented Dec 5, 2017

compatible_data field(s) in current_law_policy.json #1480

compatible_data field(s) in current_law_policy.json #1480

Comments

MattHJensen commented Jul 14, 2017 • edited Loading

MattHJensen commented Jul 14, 2017

martinholmer commented Jul 14, 2017

MattHJensen commented Jul 14, 2017 • edited Loading

martinholmer commented Jul 14, 2017

feenberg commented Jul 14, 2017 via email

martinholmer commented Jul 18, 2017

martinholmer commented Jul 19, 2017

martinholmer commented Dec 5, 2017

MattHJensen commented Jul 14, 2017 •

edited

Loading

MattHJensen commented Jul 14, 2017 •

edited

Loading