-
Notifications
You must be signed in to change notification settings - Fork 1
Validate latitude/longitude on Locations by reverse geocoding #388
base: main
Are you sure you want to change the base?
Conversation
17caec4
to
ec1a024
Compare
ec1a024
to
b2c2ad6
Compare
The author of this PR, shajith, is not an activated member of this organization on Codecov. |
I don't think this is the right approach here, for a couple of reasons. Firstly, this is a really heavy new dependency - it pulls in the whole of SciPy and NumPy, which is a big increase in our overall dependency size. I don't think it's worth adding that to our application server footprint for this single use-case. More importantly, having this as a hard validation check is going to cause us problems for any location that is close to a border. 41.9950887,-123.7249003 for example is a location in California right next to the Oregon border - but when I run this: reverse_geocoder.search((41.9950887,-123.7249003)) I get back:
Finally, there's quite a performance overhead in running this function:
Running the validator in the model itself means this will add 800ms to each save we make for a |
Rather than implementing this as validation logic in VIAL, I suggest we run this as a separate process. We could set something up that runs on a schedule to verify that locations in VIAL aren't dramatically outside their suggested boundaries - using either this library or some other mechanism, such as the API I built at https://us-counties.datasette.io/counties/county_for_latitude_longitude?longitude=-123.7249003&latitude=41.9950887 for resolving a location to a county and state based on geographic polygons from the U.S. census. |
This will be a lot easier once I add the "stream all results" mechanism to the search API in #367 - since you'll be able to do things like retrieve all 10,000 CA locations in one stream of newline-delimited GeoJSON and feed that into a script that geocodes them all. |
All that makes sense. This IS a terribly large dependency for this use case, and I had my misgivings about that. I do want to catch wildly off values during manual data entry though, but I'm happy wait to see whether this becomes a larger problem. How should I go about setting up a separate process to verify or geocode lat/lng values on locations? Do we have any precedent for that in the codebase? It seems generally useful to be able to run that sort of thing on a schedule over the data. |
We could definitely make an improvement here for manual data entry - especially if we could show people a warning there but let them override it (for weird edge cases like a cruise ship offering vaccinations just off the coast of Puerto Rico or whatever) For that I'd suggest some element of client-side validation in the admin interface itself. I'd like us to show the lat/Lon there on a map and let people drag to adjust the marker, which could relate into this too. |
Validating client-side seems sufficient for my purposes. I can go look for ways to inject JS into django admin. Would it be appropriate to use the datasette API you linked above for this purpose (as opposed to google maps or something)? |
@simonw: here's where I'm heading with the JS validation, as a stopgap: validating-locations.movThere's probably a better way to use that API? |
Solving this:
Uses: https://github.com/thampiman/reverse-geocoder
Checks that the latitude/longitude, when geocoded, produces a spot that is in the state that the location is claimed to be in.
Does this if not: