-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatible_data field(s) in current_law_policy.json #1480
Comments
I am happy to take a first stab in this direction if others are agreeable to the idea. |
@MattHJensen said about issue #1480:
Don't you think it makes sense to begin this work only after we have settled on the contents of the new CPS file? |
I am thinking about opening a PR to demonstrate how this would look applied to Likely I won't get to it until I return on Aug 18. |
@MattHJensen said:
OK, that makes sense. And who knows, maybe we'll have made some progress on the CPS file by the time you return to the office. |
On Fri, 14 Jul 2017, Matt Jensen wrote:
Now that the cps.csv file is much further along, I am opening an issue to renew discussion of adding a
compatible_data field in current_law_policy.json.
In #1074 I suggested:
(a) Add a "compatible_data" attribute to each parameter after the "notes" field.
- The acceptable values that I anticipate in the short to medium term are:
- "taxdata_puf", which refers to the PUF based datafile produced by TaxData.
- "taxdata_cps", which refers to the CPS based datafile produced by TaxData.
- "taxcalc_filings", which refers to the Luca-based taxpayer inputted data that @zrisher is
developing.
- any comma-separated combination of those three, such as "taxdata_puf, taxdata_cps,
taxcalc_filings".
o The reason why we need this attribute is that every parameter will not be relevant to every
dataset because all of the relevant variables in a dataset might be zero. The calculator
will still run just fine with the dataset and even with parameter modifications, but the
parameter modifications won't affect results, and I expect this to be quite confusing to
users.
o The TaxBrain rule would be to display all of the parameters that are compatible with the
chosen dataset (currently taxdata_puf is the only option).
Based on a conversation with @martinholmer, I now believe that we would want to make an explicit rule about
what "compatible" means, as there will be gray areas. One possible rule would be, "does this parameter
affect results with this given dataset". This is something that we could even test relatively simply,
albeit with significant computational expense (as @zrisher also points out with his comment below), so we
wouldn't want it to be a test that is run frequently.
Here is the rest of @zrisher's response to this proposal from #1074.
It will be a bit of extra maintenance work. When changing the way policy is coded, a
contributor must remember to update this field if usable data changes, and the relationships
may not be easy to trace. These fields could also be affected by changes to any of the listed
data sets. It's also computationally expensive to enforce via automated testing. However, the
value prop you described is definitely there.
I agree that the benefits are significant, but that doing this manually
introduces a likely source of error. Done mechanically, does it need to be
such a great computational expense? For each parameter we would need to
calculate revenue with the parameter changed from the default to the
maximum and minimum permitted, stopping at the first record showing a
change in liability. The small sample would be sufficient. The rest of the
model - dropq, formatting tables, etc need not be invoked. That is going
to be less than the cost of a single run of the model on the full sample.
BTW, rather than omit the parameters that have no effect, why not gray
them out, so that the user would see an indication that by switching
datasets he could work with that parameter.
dan
|
The ideas described in issue #1480 have been favorably received, so I'm changing the label from Question to Enhancement. I'm also taking @MattHJensen at his word and assigning the enhancement task to him. |
@martinholmer said in issue #1480:
After experience adding an "availability" field to the |
Now that the cps.csv file is much further along, I am opening an issue to renew discussion of adding a compatible_data field in current_law_policy.json.
In #1074 I suggested:
Based on a conversation with @martinholmer, I now believe that we would want to make an explicit rule about what "compatible" means, as there will be gray areas. One possible rule would be, "does this parameter affect results with this given dataset". This is something that we could even test relatively simply, albeit with significant computational expense (as @zrisher notes below).
Here is the rest of @zrisher's response to this proposal from #1074.
The text was updated successfully, but these errors were encountered: