-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBS-derived organizations without geo coordinates #89
Comments
I have found 148 institutions without geonames, and subsequently(?) without lat&lon: Related? |
Just as a note - in the old lobid-organisations, most (all?)of this data having an ISIL do have geo-data, e.g. DE-198 and DE-MUS-430817. |
To fix this, we need to
|
Re.
|
Some confusion here:
So what I wrote in the comment above about the steps we need to take is nonsense. The original issue remains (geo coordinates for DBS entries), and geonames are what they are and we have no plans to change anything. For Sigel-Orgs everything seems to be fine, thus I'll remove the nwbib-launch label, unassign myself and move to backlog. @hauschke If there's anything you need, get in contact (perhaps in a new issue). |
Example of organisation without lat+long, and without geonamed id: Should have: |
Generally, only DBS entries contain "Regionalschlüssel" which are then mapped to GeoNames. There can be three reasons why entries don't have a GeoNames link.
|
At least 739 entries don't have See also #76. |
@hauschke wrote:
How do you have created that data? Seems that many/all ISILs mentioned there in fact do have lat/lon & geonames data. |
My mistake. Geonames is missing, lat&long is mostly there. I used #55 (comment) to gather the data and extracted lat&long from geoname.rdf. |
Checking DBS entries as we did for ISIL entries in #93 (comment):
=> 2461 (still ~20 %) DBS-derived entries without geo data |
\o/ +1 |
For checking accuracy of OSM lookups, we can at some point automatically compare geo coordinates with regionalschlüssel. E.g. I can easily see querying for libraries in NRW via field |
Use correct separation of street address and city in query. This commit also fixes a UI issue for organisations without classification (discovered with updated test data).
Here are some checks based on rs that currently don't look that good:
|
To improve lookup result quality and avoid false positives.
To improve lookup result quality and avoid false positives.
Deployed a new version to staging where these queries look better. The confidence treshold used is too high though, resulting in missing location data for 3056 organisations. Plus, due to a bug, these organisations are skipped entirely. Will continue by fixing the bug and lowering the treshold. |
To improve lookup result quality and avoid false positives.
Processed with treshold of http://test.lobid.org/organisations/search?q=dbsID:*+AND+_missing_:location.geo I think we should move the treshold down a little further and check the results again. |
Yes, the results look very good. We might even test a lower threshold. |
Accept result independent of treshold if street and city match.
Accept result independent of treshold if street and city match.
Reprocessed with slightly lower treshold ( Results in 1218 missing locations (1921 in beta): http://test.lobid.org/organisations/search?q=dbsID:*+AND+_missing_:location.geo False positives look good too: http://test.lobid.org/organisations/search?q=rs:01* |
Log output for the 1218 missing locations: missing-geo.txt |
+1 |
Resolves #89 See: http://lobid.org/organisations/search?q=dbsID:*+AND+_missing_:location.geo http://lobid.org/organisations/search?q=rs:01* http://lobid.org/organisations/search?q=rs:03* http://lobid.org/organisations/search?q=rs:05* http://lobid.org/organisations/search?q=rs:06* http://lobid.org/organisations/search?q=rs:07* http://lobid.org/organisations/search?q=rs:08*
As said in #49 (comment), there are a lot DBS-derived entries without geo coordinates. See, e.g. this list: http://beta.lobid.org/organisations/search?q=@id=DBS&size=20&from=250
At öleast for some entries, the reason for the missing geo data is that no street address is provided by DBS. From the csv file, I can see that 124 of the entries.
First of all, I would like to know how many entries in total are lacking geo corrdinates. Can I find this out with elasticsearch, i.e. can I get all entries where a particular field is missing?
The text was updated successfully, but these errors were encountered: