Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove public_transport from features list #556

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jul 22, 2021

When searching a Berlin street, Pelias returned a bus stop with the same name:

https://pelias.github.io/compare/#/v1/autocomplete?text=gleimstrasse
https://www.openstreetmap.org/relation/6378676

Screenshot 2021-07-22 at 16 42 54

When I dug a little deeper I realised that this is actually quite a common issue, many of these public_transport tags are named literally the same as a nearby street or the containing neighbourhood.

I checked the nominatim and osm2pgsql source to see how they handle it, and... they don't, from what I can tell they don't target the tag public_transport for the gazetteer style export from osm2pgsql at all 🤔

This PR removes public_transport:* from our builds, the diff of which returned a bunch of results like I mentioned above, where the name was easily confused with a nearby street, neighbourhood.

It seems counterintuitive to remove public_transport however many of the things we were intending to target with that are actually being covered by other more descriptive tags such as railway~station+name.

I believe this will be very beneficial for users as there will be less duplication and confusion with similarly named entities in the results.

It's possible that we end up removing some entries which some users would like, although I suspect this will be rare, if at all, and we can always add more specific tags again in the future as required.

note: as part of this work I rewrote some of the end-to-end test code since the diff output was basically impossible to read.

"_type": "_doc",
"data": {
"name": {
"default": "Vancouver Pacific Central Station"
Copy link
Member Author

@missinglink missinglink Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would have been nice to keep since the name is fairly unique.
Fortunately that node has since been deleted and replaced with https://www.openstreetmap.org/way/32603955 which we will bring in via building~name

@missinglink
Copy link
Member Author

I'm actually considering dropping highway:bus_stop+name too, it produces weird labels, like this one from nominatim:

Screenshot 2021-07-22 at 17 15 56

@missinglink
Copy link
Member Author

removed highway:bus_stop and railway:tram_stop, these smaller local stops generally don't have a unique name.

@orangejulius
Copy link
Member

Hm, I am torn on this one. On the one hand, I can absolutely see how this can in many cases improve results, especially for "plain" queries without any filtering on layers.

But I also think there's a good chance this will impact some queries. Lots of Pelias users are particularly interested in transit stops, so ensuring we don't break anything is worth some thought.

Here are some scenarios to think about

  • If we have people using layers=venue to filter down to only transit stop POIs as best as is currently possible (we don't yet have a transit layer), would this break things?
  • If we did add a transit layer in the future, would this break queries that use it?
  • I can never remember where we're at with the categories filter, but could this break any current queries that use it, or if we enable categories functionality in the future, could it break that?
  • Does this possibly affect queries in one country or even metro region more than others? I think transit tagging is often very different from place to place. We have a few test cases to help detect regressions, maybe we should try to build a bunch more that cover several cities and continents?
  • How hard would it be to achieve the same goals through a different means, like deduplication?

All that said, I 100% support the goal of cleaner query results in cases like this, so investigating solutions to the problem are worth it.

@missinglink
Copy link
Member Author

missinglink commented Jul 23, 2021

Based on the changes in fixture diff in this PR it appears to be positive, although I can understand some hesitation.

I've generated a much larger diff for the whole of Berlin (1MB) which shows all the records which would be removed as part of this feature.

This was generated by running the end-to-end tests twice, once with the original config and once with the config proposed in this PR, I then diff'd it with the following command to produce a summary of only the lines which were deleted:

comm -2 -3 <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.jsonl) <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.no_public_transport.jsonl) > berlin-latest.osm.pbf.diff.jsonl

cat berlin-latest.osm.pbf.diff.jsonl | jq -r '(._id + " " + .data.name.default)'

I chose a popular transit stop Potsdamer Platz as a test case since it has a variety of public transport options, a couple streets with the same name and it's also the name of the surrounding neighbourhood:

grep -i 'potsdamer' berlin.osm.public_transport.diff.jsonl | grep -i 'platz'

openstreetmap:venue:node/1874617251 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/1877576992 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877576997 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877848269 S+U Potsdamer Platz
openstreetmap:venue:node/1877848274 S+U Potsdamer Platz
openstreetmap:venue:node/1877848279 S+U Potsdamer Platz
openstreetmap:venue:node/1877848283 S+U Potsdamer Platz
openstreetmap:venue:node/2093326467 S Potsdamer Platz
openstreetmap:venue:node/2098466340 U Potsdamer Platz
openstreetmap:venue:node/2451831547 S Potsdamer Platz
openstreetmap:venue:node/2456043650 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043651 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043652 Berlin Potsdamer Platz
openstreetmap:venue:node/27424921 Berlin Potsdamer Platz
openstreetmap:venue:node/29495049 U Potsdamer Platz
openstreetmap:venue:node/298516304 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/30353079 S Potsdamer Platz
openstreetmap:venue:node/322796203 S Potsdamer Platz
openstreetmap:venue:relation/2364886 Berlin, S Potsdamer Platz/Voßstraße
openstreetmap:venue:relation/2368542 Berlin, Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:relation/3198309 S Potsdamer Platz
openstreetmap:venue:relation/3200537 S Potsdamer Platz
openstreetmap:venue:relation/5686702 Berlin Potsdamer Platz, Bahnhof
openstreetmap:venue:relation/5686709 Berlin, S+U Potsdamer Platz [Bus Stresemannstraße]
openstreetmap:venue:relation/6865726 U Potsdamer Platz
openstreetmap:venue:relation/8435158 Berlin, S+U Potsdamer Platz [Bus Leipziger Straße]
openstreetmap:venue:way/237532199 Berlin Potsdamer Platz
openstreetmap:venue:way/237532200 Berlin Potsdamer Platz
openstreetmap:venue:way/382265010 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265011 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265012 S+U Potsdamer Platz
openstreetmap:venue:way/382265013 S+U Potsdamer Platz
openstreetmap:venue:way/382265014 S+U Potsdamer Platz
openstreetmap:venue:way/382265015 S+U Potsdamer Platz
openstreetmap:venue:way/382265023 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:way/382265024 Varian-Fry-Straße/Potsdamer Platz

The most popular public transit option is "U Potsdamer Platz" (the underground) and we can see that 3x records with U Potsdamer Platz are removed and 10x in some form of "S+U Potsdamer Platz".

So a couple things worth mentioning here:

  • There's a fair amount of duplication in what's listed above
  • They're all venues (as expected)
  • They all contain the name of a nearby street/neighbourhood, in some cases the city name too

And looking at what didn't get removed:

cat berlin-latest.osm.pbf.no_public_transport.jsonl | jq -r '(._id + " " + .data.name.default)' | grep -i 'potsdamer' | grep -i 'platz' | grep 'venue'

openstreetmap:venue:node/84644779 Suite Novotel Berlin City Potsdamer Platz
openstreetmap:venue:node/340388625 Dalí - Die Ausstellung am Potsdamer Platz
openstreetmap:venue:node/365726272 Mövenpick Hotel Berlin am Potsdamer Platz
openstreetmap:venue:node/394948618 U Potsdamer Platz
openstreetmap:venue:node/394948620 U Potsdamer Platz
openstreetmap:venue:node/396004773 U Potsdamer Platz
openstreetmap:venue:node/1251201121 U-Bhf. Potsdamer Platz/Linkstraße
openstreetmap:venue:node/1259545063 ibis Berlin City Potsdamer Platz
openstreetmap:venue:node/1318190070 Potsdamer Platz
openstreetmap:venue:node/1336883756 Stage Theater am Potsdamer Platz
openstreetmap:venue:node/1573193294 S+U Potsdamer Platz
openstreetmap:venue:node/1573193317 S+U Potsdamer Platz
openstreetmap:venue:node/1576240058 S+U Potsdamer Platz
openstreetmap:venue:node/1788588105 Potsdamer-Platz-Apotheke
openstreetmap:venue:node/2451831603 S Potsdamer Platz
openstreetmap:venue:node/2451831708 S Potsdamer Platz
openstreetmap:venue:node/2454155640 S+U Potsdamer Platz
openstreetmap:venue:node/2456291607 S Potsdamer Platz
openstreetmap:venue:node/2491792013 S+U Potsdamer Platz
openstreetmap:venue:node/2491824683 S Potsdamer Platz
openstreetmap:venue:node/2491846867 S Potsdamer Platz
openstreetmap:venue:node/2491848735 S Potsdamer Platz
openstreetmap:venue:node/2491871589 S+U Potsdamer Platz
openstreetmap:venue:node/2491871593 S+U Potsdamer Platz
openstreetmap:venue:node/2491872658 S Potsdamer Platz
openstreetmap:venue:node/2581112669 Hotel Potsdamer Hof Berlin Am Potsdamer Platz
openstreetmap:venue:node/2749149916 Potsdamer Platz/Ebertstraße
openstreetmap:venue:node/3493177833 Berlin Potsdamer Platz
openstreetmap:venue:node/3519366793 Motel One Potsdamer Platz
openstreetmap:venue:node/3780937605 NH Berlin Potsdamer Platz
openstreetmap:venue:node/3837620167 Alper Hotel am Potsdamer Platz
openstreetmap:venue:node/3837620172 B&B Hotel Berlin-Potsdamer Platz
openstreetmap:venue:node/3854590996 S Potsdamer Platz
openstreetmap:venue:node/3854591000 U Potsdamer Platz
openstreetmap:venue:node/5333707421 WeWork Potsdamer Platz
openstreetmap:venue:node/6143046477 Potsdamer-Platz-Apotheke
openstreetmap:venue:way/22989670 Potsdamer Platz Arkaden
openstreetmap:venue:way/81456696 Scandic Berlin Potsdamer Platz
openstreetmap:venue:way/92335625 Crowne Plaza Berlin Potsdamer Platz
openstreetmap:venue:relation/3200536 Potsdamer Platz

There's still 4x "U Potsdamer Platz" and 7x with some form of "S+U Potsdamer Platz" 😆

Some entries like "Varian-Fry-Straße/Potsdamer Platz" have been completely removed, which is to be expected since this PR removes all bus/tram stops too.

Nominatim does actually include one version of that bus stop and I guess it's a product decision about whether to include those or not and how much value they provide, it's most likely being included from being tagged highway:bus_stop.

@missinglink
Copy link
Member Author

missinglink commented Jul 23, 2021

Public transit stops are actually quite tricky, this is what Nominatim returns for "S+U Potsdamer Platz"

Screenshot 2021-07-23 at 16 43 25

And what Pelias autocomplete returns (we do a good job of deduping down from 17 hits):

Screenshot 2021-07-23 at 16 45 31

The /v1/search endpoint returns garbage because libpostal doesn't understand the input 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants