Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatible_data field(s) in current_law_policy.json #1480

Closed
MattHJensen opened this issue Jul 14, 2017 · 8 comments
Closed

compatible_data field(s) in current_law_policy.json #1480

MattHJensen opened this issue Jul 14, 2017 · 8 comments
Assignees

Comments

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 14, 2017

Now that the cps.csv file is much further along, I am opening an issue to renew discussion of adding a compatible_data field in current_law_policy.json.

In #1074 I suggested:

Add a "compatible_data" attribute to each parameter after the "notes" field.

The acceptable values that I anticipate in the short to medium term are:
- "taxdata_puf", which would refer to the PUF based datafile produced by TaxData.
- "taxdata_cps", which would refer to the CPS based datafile produced by TaxData.
- "taxcalc_filings", which would refer to the Luca-based taxpayer inputted data that @zrisher is developing.
- any comma-separated combination of those three, such as "taxdata_puf, taxdata_cps, taxcalc_filings".

We need this attribute because not every parameter will be relevant to every data source. For some parameters, all of the relevant variables from the data source might be zero. Implementing reforms to these parameters will not influence the results. Given that our users implement reforms to many provisions at once, many will not notice that there is a problem. I expect this will lead to quite a few silly mistakes and some resentment. >

Based on a conversation with @martinholmer, I now believe that we would want to make an explicit rule about what "compatible" means, as there will be gray areas. One possible rule would be, "does this parameter affect results with this given dataset". This is something that we could even test relatively simply, albeit with significant computational expense (as @zrisher notes below).

Here is the rest of @zrisher's response to this proposal from #1074.

It will be a bit of extra maintenance work. When changing the way policy is coded, a contributor must remember to update this field if usable data changes, and the relationships may not be easy to trace. These fields could also be affected by changes to any of the listed data sets. It's also computationally expensive to enforce via automated testing. However, the value prop you described is definitely there.

@MattHJensen
Copy link
Contributor Author

I am happy to take a first stab in this direction if others are agreeable to the idea.

@martinholmer
Copy link
Collaborator

@MattHJensen said about issue #1480:

I am happy to take a first stab in this direction if others are agreeable to the idea.

Don't you think it makes sense to begin this work only after we have settled on the contents of the new CPS file?

@MattHJensen
Copy link
Contributor Author

MattHJensen commented Jul 14, 2017

I am thinking about opening a PR to demonstrate how this would look applied to puf.csv alone. It would consist of a test and simple changes to current_law_policy.json.

Likely I won't get to it until I return on Aug 18.

@martinholmer
Copy link
Collaborator

@MattHJensen said:

I am thinking about opening a PR to demonstrate how this would look applied to puf.csv alone. It would consist of a test and simple changes to current_law_policy.json.

Likely I won't get to it until I return on Aug 18.

OK, that makes sense. And who knows, maybe we'll have made some progress on the CPS file by the time you return to the office.

@feenberg
Copy link
Contributor

feenberg commented Jul 14, 2017 via email

@martinholmer
Copy link
Collaborator

The ideas described in issue #1480 have been favorably received, so I'm changing the label from Question to Enhancement. I'm also taking @MattHJensen at his word and assigning the enhancement task to him.

@martinholmer
Copy link
Collaborator

@martinholmer said in issue #1480:

To be more consistent with the other JSON fields in the current_law_policy.json file, the value of the "compatible_data" key should probably be a list (not a comma separated string).

After experience adding an "availability" field to the records_variables.json file, I think this was a poor suggestion. So, I'm going to remove all traces of my poorly thought-out suggestion.

@MattHJensen

@martinholmer
Copy link
Collaborator

Issue #1480 is being resolved by pending pull request #1614.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants