Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Overture POIs as a shop data source #5199

Closed
3 of 5 tasks
tsmock opened this issue Aug 16, 2023 · 18 comments
Closed
3 of 5 tasks

Use Overture POIs as a shop data source #5199

tsmock opened this issue Aug 16, 2023 · 18 comments

Comments

@tsmock
Copy link

tsmock commented Aug 16, 2023

General

Affected tag(s) to be modified/added: Depends upon the business. We will need to have a mapping of overture poi type -> osm tags. Some people have already started work on that here: https://wiki.openstreetmap.org/wiki/Overture_Categories .
Question asked: Does this <business type> named <business name> exist here?

Checklist

Checklist for quest suggestions (see guidelines):

  • 🚧 To be added tag is established and has a useful purpose
  • 🤔 Any answer the user can give must have an equivalent tagging (Quest should not reappear to other users when solved by one)
    We would need a server for this. Or some kind of feedback API provided by the overture folks.
  • 🐿️ Easily answerable by any pedestrian from the outside but a survey is necessary
  • 💤 Not an overwhelming percentage of quests have the same answer (No spam)
    This is doable, but would need to have some kind of deduplication (e.g., searchNodes(bbox(point, expandByXMeters)).stream().filter(OsmPrimitive::isTagged).noneMatch(p -> p.containsTags(point.getTags())). From what I understand, SC downloads an area now.
    Server-side deduplication would be better, or we can download smaller areas for the POIs (e.g., only a z17 tile instead of a z15 tile).
  • 🕓 Applies to a reasonable number of map data (Worth the effort)
    Depends upon who you ask; most people would say that POIs are kind of important.

Ideas for implementation

Element selection:

Metadata needed:

Proposed UI:

[pin on map]


Does this <business type> named <business name> exist here?
| Other Answers... | No | Yes |


Other Answers...

  • Misnamed (offer text entry)
  • Mistyped (ask what type of business it is)
  • Permanently closed (Add, but with opening_hours=closed)
  • Exists, located nearby (offer user chance to correct location)
@westnordost
Copy link
Member

westnordost commented Aug 16, 2023

Hm, I have a few reservations against that, I tend towards closing this as will-not-fix:

  1. Overture POI data seems to be of somewhat low quality. At least, many POIs will be wrong, duplicated, non-existent, maybe-existent-but-not-visible-from-the-outside, etc.. (from what I overheard in the chat). To dump overture data into the app and let StreetComplete / OpenStreetMap users play the trash collection for them sounds a bit ... hmm... disrespectful towards our users. I understand there'd be a lot of those POIs. It sounds like "Long live TIGER cleanup!" (@1e5 😅 ) a bit all over again, only world-wide.

  2. You mentioned it yourself: The de-duplication and "merging" of the data is going to be a complex task. Not only as a preprocessing step, but also during answering of the quest, too: What if the POI is mapped at a wrong location but at the right location, there is already that same POI? When the mapper specifies the correct location of that Overture shop POI, it would lead to a duplicate node being created next to the POI on OSM unless accounted for. Surely many more such cases would need to be accounted for when dealing with merging a different data source into OSM.

  3. You also mentioned it: When the user answers "no", it would either be contributed upstream somehow (to Overture?) or the information that indeed a certain POI in Overture is wrong needs to be stored on another server. Sounds a bit complex.

  4. Also, the users are already on-site and already have a tool to view current POIs and complete or correct them in the most simple manner possible if they so chose - the shops overlay. So, it seems to me that actively asking for whether this or that POI is here or there and providing all the UI flow of what to do if it is not to be more effort - not only for the developer but also for the StreetComplete mapper himself.

Of course, the last point could also be said for any quest as it would probably be more efficient to record all the things with EveryDoor et al. 🤷‍♀️

@tsmock
Copy link
Author

tsmock commented Aug 16, 2023

  1. That wasn't my intent -- when I've used StreetComplete in the past, I've found that I tend to not want to input new businesses. To be fair, this might be due to not having keyboard swipe available. So having something I can look at and use/modify would be better.
  2. Realistically, we could just restrict the maximum distance to something reasonable that would already be checked server-side.
  3. Very much so. If I were to implement something, it would be a "simple" service that takes a lat/lon and name, and returns it via a bbox query or vector tiles.
  4. We could (alternatively) use the overture POI list to show possible nearby businesses, prefilling the name and (maybe) prefilling the business type. This would remove the need to keep track of known-bad POIs.

With that said, if you don't think this would be helpful, feel free to close it.

@mnalis
Copy link
Member

mnalis commented Aug 17, 2023

@tsmock you might be interested in SCEE ("expert edition" fork of StreetComplete) which has Custom quest feature which works on locally uploaded .csv file, and can do (with other SCEE features) most of what you want (adding a POI, moving it around, adding/modifying/removing tags like name / amenity etc., recording [locally] what you solved and what remains to be solved).

You would still need to deduplicate Overpass/OSM (optional; don't need to do that if you prefer extra manual verification work over writing deduplication code) & convert Overpass data to that .csv format for the region you're interested in editing, but that seems like trivial effort compared to the idea above with servers, two-way API to Overpass etc.

If you do decide to try that; do share your experiences (and converting scripts) afterward; other people might be interested in that too.

@matkoniecz
Copy link
Member

@tsmock In which areas Overture data would overall save time on inputting data?

In areas I checked and know data quality is low enough that repairing data would take more time than mapping from scratch.

It seems, at most, to be useful as detector of undermapped areas. Or as data source if you prefer its deficiencies over OSM data issues with coverage.

@matkoniecz
Copy link
Member

Also, has anyone even managed to decently deduplicate OSM and Overture data? Looking at it I have no idea how to do it reliably given quality of data there.

@Helium314
Copy link
Collaborator

@tsmock such a quest could be added in SCEE. But having a way to communicate back to the server when something was added or not found at the indicated location is definitely necessary.

There was a previous request of doing something similar using Osmose fixables, but they can only be downloaded for individual issues.

@matkoniecz
Copy link
Member

https://bdon.github.io/overture-tiles/places.html#0.92/0/0 can be used to judge data quality

in my area at least (Kraków, Poland) data quality is laughably bad

@Helium314
Copy link
Collaborator

Wow... there really is room for improvement.

@westnordost
Copy link
Member

westnordost commented Aug 17, 2023

https://bdon.github.io/overture-tiles/places.html#0.92/0/0 can be used to judge data quality

I selected a confidence of >= 0.6 and checked some POIs on a random location near me that I personally know (Hamburg).

  • 11 were correct and at the right position
  • 3 were less than 15 meters wrongly placed
  • 5 were more than 15 meters wrongly placed
  • 8 were completely wrong (never existed, existed many years ago or at completely different places)

At confidence >= 0.9, only a few places remain:

  • 5 were correct and at the right position
  • 1 were more than 15 meters wrongly placed

(The OSM data has 43 POIs at the same map excerpt of which all are currently correct)

@mnalis
Copy link
Member

mnalis commented Aug 17, 2023

5 were correct and at the right position
1 were more than 15 meters wrongly placed

Are those numbers for POIs which exist in Overture but are missing from OSM?
Or are they only about POIs which are in Overture, regardless if they are already mapped in OSM or not?

@westnordost
Copy link
Member

No, see my last sentencte. Of the map excerpt I looked at, all correct POIs of Overture also existed in OSM. Such a random example in a well-mapped neighbourhood is of course not representative.

@tsmock
Copy link
Author

tsmock commented Aug 17, 2023

I did the same thing for Main Street near where I live, although I only looked at 3 blocks on the south side. I've gone up and down it several times with SC (mostly north side), but I almost never add new businesses via SC. I've done it once, but I didn't enjoy the experience. I am much more likely to take a picture and make a note for later mapping.

  • Correct: 18 (as compared to OSM)
  • <15m: 2
  • >15m: 2 (15m isn't that bad; I would have used 100m as my threshold) -- Esri World Imagery is off by ~6m in my location, and nominal GPS accuracy is ~5m)
  • Intermittent: 1 (farmers market, only happens on specific days during the summer)
  • Duplicates: 3
    • Same name, different order
    • Names from businesses (mostly mortgage brokers)
  • No clue: 10
    • I need to take another walk with a 360 camera to filter these. For now, let us assume they are all wrong.

So ~2/3 (23/38) of the POIs in my test area are "good" before filtering based off of the confidence score.

Also, has anyone even managed to decently deduplicate OSM and Overture data? Looking at it I have no idea how to do it reliably given quality of data there.

I'll note that I'm currently planning on making the POIs available in JOSM (through the MapWithAI plugin), and I'll probably be doing some client-side conflation and translation. For conflation, I think doing the translation first and then conflating based off of primary tags (primarily the name) will work the best.

@westnordost
Copy link
Member

So anyway, I think I will close this here but feel free to continue commenting. If this is implemented, it may make more sense to put this into SCEE, at least unless the concerns I posted are to a degree disproven in actual use.

@westnordost westnordost closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2023
@mnalis
Copy link
Member

mnalis commented Aug 17, 2023

Also, has anyone even managed to decently deduplicate OSM and Overture data? Looking at it I have no idea how to do it reliably given quality of data there.

I guess one could see if in the nearby radius there is either a POI with exact same name, or POI with similar name and same type, and remove such Overture POI as (possible) duplicate (note that luckily one doesn't have to very precise here: wrong guess won't result in wrong data in OSM, just in reducing potential benefit slightly).


I have doubts whether importing from Overture dataset is good idea though (from business perspective helping Overture which might negatively impact OpenStreetMap, as well as potential legal contamination with Overture data - ODbL notwithstanding)

@matkoniecz
Copy link
Member

POI with exact same name, or POI with similar name and same type

The problem is that names are very often divergent (only some due top garbage data - sometimes place has interpretable name). And categorization in Overture extremely often utterly mismatches reality.

@mnalis
Copy link
Member

mnalis commented Aug 19, 2023

The problem is that names are very often divergent (only some due top garbage data - sometimes place has interpretable name). And categorization in Overture extremely often utterly mismatches reality.

I agree, but that is kind of a point of this idea that such ambiguous Overture POIs should be verified by on-the-ground mapper (instead of just doing the regular data import with some conflator, which we could do if the data were high-quality instead)


But as said, I do not see support for such dataset as priority (I'd rank it little below show osmose errors in SCEE).
After all, if I'm on the ground and I see missing shops/amenities I'd just switch to EveryDoor (or SC shops overlay) and add all of them in that area if I have the time (and if I don't have time, then existence of Overture quest is irrelevant, as I won't be solving them anyway 😄 )

@matkoniecz
Copy link
Member

I agree, but that is kind of a point of this idea that such ambiguous Overture POIs should be verified by on-the-ground mapper (instead of just doing the regular data import with some conflator, which we could do if the data were high-quality instead)

my point is that it is laborious enough to filter and verify to be not useful at all (at this point, in areas I tested it has pretty low quality)

@pnorman
Copy link

pnorman commented Aug 24, 2023

I find the slow part of entering POIs to be all the typing. If I have to do that anyways because Overture names need changing, it doesn't seem worth it, even aside from all the other quality issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants