-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zipcode search - limit match to left(5) of parameter value. #2980
Comments
For context: The ZIP code data in the itemized transactions of scheudles is inconsistent. Some filers provide 5-digit ZIP codes and some filers provide 9-digit ZIP codes. Currently, if a user is searching for ZIP code 20009-2415, the app's behavior varies depending if the user searches for 20009 or for 20009-2415. The underlying data might be |
My thoughts: first, make sure any non numeric character is removed from zip argument ( by endpoint ); when user enter a 5-digit zip ( or 6, 7, 8 by error ), the endpoint should return all transactions with first 5-digit match no matter how many digits of the zip code they have in database ( could be 5 ,6,7,8, 9, or more in table ); when user enter a 9-digit zip, the endpoint should return all transactions with exact 9-digit match. The reason for this approach is in this scenario, the user actually is looking for contributors from a more narrow geographical area. If endpoint returns a more broad results, it does not serve what user wants here. |
Is it simply a matter of always using "starts like" and requiring at least 5 characters? 22655 would get any zip that starts with 22655. |
This could be another approach to consider. So 22655 gets most result, 226551111 gets the least result. others would get number between the two. The database probably has most zip code with either 5 or 9 digits since these two are the standard. |
We could make a new multi-starts-with filter or use the standard filters and have a 5 digit column for matching. Also, we should be able to handle non-numeric zips because there are citizens and green card holders that can legally donate when they live in other countries. |
I think we should make a new multi-starts-with filter, and in sqlalchemy, there is like() method |
@hcaofec - that won't fix the problem with the existing search not working properly. Searching for a 5-digit zip should return all zipcodes that start with those 5 digits. Currently, it's excluding 9-digit zips that match on the first 5, which is problematic. Searching for 20463 should return: Instead, it's only returning: |
Let's start with matching on the first 5 digits (characters). Matching on the first 5 digits might return more results than a user is expecting but that's better then returning nothing because currently the search is doing an exact match. |
Right. I think there are a number of ways we can consider improving on that later but we really need to fix it now and that is the easiest approach. |
column.like() method works similar as the like operator in sql, so when searching for 5-digit zip, the sql will be match() method serves the purpose here too. |
@hcaofec that would solve our 5-digit search problem, but I'm not sure it's a good solution for our 9-digit search problem. It's my understanding that when a user searches "204630923" they're looking for all contributions from that area. But the data isn't consistent. Data User searching for 204630923 would expect it to return: ..But because of inconsistent data, it only returns: and is missing |
Agreed. Even if a user enters a 9-digit ZIP code we should return rows that match on the first 5 digits. Our data is too inconsistent and the results returned would be misleading. We can explain this in with a tool tip on the user interface. |
when a user enter 9-digit, the query would return rows that match on the first 5 digits using sql like this: How about foreign zip code like Canada? If a user enter |
For now I would say don't worry about anything that doesn't conform. We can take another pass at this later. I think best to make it always like 20163% |
Should we merge fecgov/fec-cms#1391 without this? Or wait? |
Thanks @AmyKort for linking this to the new PR. Since @patphongs marked this as a blocker for fecgov/fec-cms#1391, it's probably better to wait till this is merged. |
Thanks @hcaofec ! |
PR has been merged. |
See related cms issue: fecgov/fec-cms#1391
When a 9-digit zip is passed to the
/schedules/schedule_a/
endpoint, we should truncate it to 5 characters. Not all filers report 9-digit zips, so searching by zip+4 returns unreliable data. In addition, currently searching by 5-digit zips is excluding records that have 9-digit zips. Also, we should add some automated tests to test_itemized to confirm this behavior.schedule_a.py
- match the inputted zip value to left 5 characters to the left(5) ofcontributor_zip
column to in theScheduleA
model - do this in the resource withif kwargs.get('contributor_zip'):...
test_itemized.py
- add some automated tests for zip+4 searchesShould all schedules work this way? only the ones that search by zip. Maybe next look at
ScheduleAByZip
- looks like it doesn't return results when you pass +4.https://api.open.fec.gov/v1/schedules/schedule_a/by_zip/?api_key=DEMO_KEY&zip=200162409&page=1&per_page=20
Example
The text was updated successfully, but these errors were encountered: