Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCHousing is missing total unit counts #495

Open
NealHumphrey opened this issue Aug 26, 2017 · 4 comments
Open

DCHousing is missing total unit counts #495

NealHumphrey opened this issue Aug 26, 2017 · 4 comments
Labels

Comments

@NealHumphrey
Copy link
Collaborator

The DCHousing data set (from DMPED available on opendata.dc.gov) contains "Report_units_affordable" field which we have mapped to the proj_units_assist fields. However, it does not have data for total units i.e. proj_units_tot. Therefore, projects added from this data set have missing data for this field.

It also so happens that the project with the most subsidized units ("BARRY FARM - ONSITE: 2580 FIRTH STERLING AVENUE SE") comes from this data set. Several user testers were concerned / confused why the maximum total units was larger than the maximum subsidized units.

I have reached out to DMPED re: the data source to see if it is possible to add more fields from their source data, including this field. However, assuming we can't get this data updated from the source, what's the best way to handle this? Options:

  1. When proj_units_tot is missing, use proj_units_assist_max instead. This will be partially accurate insofar as there will be "at least" that many units. However, this will mean that the calculated field of percent subsidized could be inaccurate if the real proj_units_tot is higher.
  2. Leave it as-is, and address the discrepancy with some short text in the proj_units_tot description i.e. "Missing data may mean that the highest total units value is less than the highest number of subsidized units." Note that users often do not read these descriptions.
  3. Get this data from another source. It's not clear where to get this, though. It's not in the tax or mar tables. It is theoretically in CAMA, but or the Barry Farms example the one to many nature of project to parcel makes it hard to find all the unit counts - we have not (yet?) calculated parcels for projects that do not come from the Preservation Catalog.

@ptatian your input on this would be especially useful. How do you get the parcel mapping in the PresCat? Which of the 3 options would you see as the best path as a user?

@ptatian
Copy link
Collaborator

ptatian commented Aug 29, 2017

Hi @NealHumphrey. Thanks for the question. Unless you have all of the buildings accounted for, you won't get the right unit count from MAR or CAMA. We've been recently trying to tackle this problem of multi-building projects in the Preservation Catalog. In a GitHub issue for PresCat, I describe using owner names in the real property data to try to find all of the related parcels and addresses for a project. This may be more than you want to get into here, however.

Of your three options, I would probably go for 1. Note that it is not a problem that the total units > assisted units. That's likely to be the case in newer developments, many of which are mixed income and therefore have both assisted and nonassisted units.

I hope that helps. Please let me know if you have more questions.

@NealHumphrey
Copy link
Collaborator Author

My concern with 1) is when we have a mixed income project but don't have the actual total units we would falsely report that it is 100% subsidized. What are your thoughts on that issue?

One option for this is to only calculate the "percent_subsidized" field when we aren't doing that substitution - perhaps we could use an additional 'estimated_units_tot' field for the filtering in the map view, and then report the various sources of total units side by side until we have better data to be confident in a way to arrive at the single actual number of units.

On a related note - I have been conversing with Open Data about some related data sets to use for calculating the total number of residential units by zone, so that we can can report statistics like the percent of units in Ward 1 that are subsidized. They recently released a new dataset of row-by-row condo and rental units. This might provide another pathway to getting total unit counts, though it too will suffer from the problem of undercounting in cases where the catalog does not have all of the buildings properly accounted for w/ a list of mar_ids.

That data set currently has corrupted address_id fields in the csv download which I've asked them to fix; when I get a corrected data set I can do a quick comparison like the one you did in the prescat issue to see how far off the numbers are. I'll report back on that.

@ptatian
Copy link
Collaborator

ptatian commented Aug 30, 2017

It's a fair point. Flagging the total unit count as "unknown" is also a legitimate option. Perhaps that's better, rather putting up a number that we aren't sure about.

@NealHumphrey
Copy link
Collaborator Author

In discussing w/ the maintainer of the data at the MAR, it sounds like the active_address_unit_count field in the MAR can be pretty much trusted to be accurate - but as Peter notes, only in cases where we have actually captured all of the relevant addresses for the project.

I am going to move forward with adding an extra field of 'proj_units_tot_mar' that captures the total units as best we know it in the mar. Then we can use an additional field `proj_units_tot_recommended':

  • if there is a proj_units_tot value, use that
  • If the mar count is greater than or equal to the subsidized unit count, assume we can use that we've got most/all of the addresses and
  • If subsidized unit count is null, use the mar count
  • If these conditions aren't met, treat the proj_units_tot_recommended as null.

When working through this I'll see if there are any other stipulations that make sense to put on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants