Define standard: segments & intersections #66

terryf82 · 2018-03-04T01:05:48Z

Segments & intersections are now to be based off OSM data (nodes, ways and relations) to allow any city with OSM coverage to onboard into the project easily.

We need to define an initial standard of how the segments & intersections will be stored once built. Some questions that have been raised which may still be open -

what are the required & optional characteristics of a segment/intersection, including data typing?
are segments/intersections stored separately, or together? If stored together, how do we differentiate?
are we storing foreign key data for segments/intersections that connects them back to the OSM nodes/ways/relations from which they were built, so that we can detect & respond to changes in the source and possibly pass information back to OSM at some point?

j-t-t · 2018-03-07T17:25:44Z

To clarify what we've done thus far: a 'segment' is either an intersection segment or a non-intersection segment. At the moment they are stored separately:

A non-intersection segment is a linestring, stored in a shapefile (at the moment in processed/maps/non_inters_segments.shp). They contain features drawn from open street map and can contain features from an additional city-specific map.
An intersection 'segment' is a multiline string consisting of the portion of osm ways that fall within a buffer of a set of close together intersection nodes. At the moment, the shapes of these segments are stored in processed/maps/inters_segments.shp, and their features are stored in a json file of a list of feature dicts for each intersection id (intersection ids are at the moment arbitrarily generated and only used internally).

Current set up doesn't need to be the final say on how things are stored!

j-t-t · 2018-03-07T17:34:27Z

With regard to connecting back to open street map, it's not currently done but I agree that it needs to be. I've been scoping this out, and here's what I've found for non-intersection segments:

Openstreetmap has a whole bunch of extraneous nodes that we don’t care about. We’re using the osmnx package to simplify the road network. Osmnx has two main ways to simplify the network:

Taking out everything but intersections and end points (they call this ‘strict’)
Taking out everything but intersections, end points, and where the osmids change

There are a number of reasons osm ids can change even though it’s not obviously an intersection

There is an intersection with a private road (or a service road). Sometimes these don’t show up on Boston’s map, and sometimes they do (there’s a bit of a mismatch sometimes between service/private roads appearing on osm vs. Boston). Sometimes these intersections even have traffic lights.
The number of lanes change because a turn lane is added
Roads change names (at points where there is no intersection)
The highway type can change: it can go from a trunk to a trunk link, or other similar transitions

It seems to me that only the first case, and possibly only when there’s an actual traffic light should be two segments separated by an intersection. But it might be interesting to eventually look at some of the other nodes, for example is the likelihood of a crash higher when the number of lanes changes?

Thus, a non intersection segment (as this project has historically been defining them) can have more than one osm way id. So I think we need to decide if we just want to store a list of osm ids for each non-intersection segment, or if we want to redefine non-intersection segments to correspond 1-to-1 to an osmid. That would mean that we'd have an additional set of nodes represented somewhere that are not intersections. I can see pluses and minuses to redefining them:

Plus: we'll get more granular feature information (number of lanes, for example)
Plus: we won't have non-intersection segments have varying features, e.g. a different number of lanes in different places
Minus: having an additional set of nodes that aren't intersections will be one more thing to keep track of
Potential Minus: some osm ways might fall solely with an intersection which might have annoying unforeseen consequences that I'm not thinking about

j-t-t · 2018-03-07T17:39:31Z

For intersection segments we will likely always want to have it represented in a way where each intersection corresponds to one or more osm nodes. This is because an intersection is by definition all the portions of ways that fall inside a bounding box of a set of close together intersection nodes.

alicefeng · 2018-08-27T23:36:03Z

Here are the features I'm currently seeing that have different names for the same thing:

SPEEDLIMIT vs osm_speed
F_F_Class vs hwy_type (not identical but seem to be in the same vein)

terryf82 · 2018-08-28T10:42:20Z

As discussed in the channel - if we're going to implement a segments data standard to make viz development much simpler and more consistent, it makes sense for it to be done when the segment data is being created, which is quite early on in the pipeline process (data_generation).

This phase retrieves OSM segments, with optional overloading taking place if a city-supplied map is available (@j-t-t have I got this right?). It writes the segment data to file as part of make_canon_dataset, which in turn is used to train the model and generate predictions.

I'm happy to have a go at standardizing the segments and following the pipeline through to update any references/uses of the current feature names, but I wanted to check first whether those who actually wrote these scripts (@bpben @j-t-t @andhint @alicefeng) had any concerns about this issue or wanted to handle specific parts? Thanks.

j-t-t · 2018-08-28T23:26:54Z

This is an interesting question. The reason we did the Boston-specific features was that performance of the model was better for them than for the osm features. I understand the desire to standardize, certainly from a viz perspective, but also for cities to be able to share information. But I do think that there's value for cities to be able to use their own features. I think there's value to that, and that's certainly something that we told Boston we could do when we met with them in July.

So if your plan is just to map features that we know about: speed limit and F_F_Class to the osm features, that seems fine. But I don't think we should drop existing Boston features, or prevent other cities from adding their features.

There's not really any overloading happening. Right now, the only reason Boston doesn't use osm features is that we specify which features to use in config_boston.yml. We could use both easily enough. If we think that Boston's speed limit/F_F_Class info is more useful or informative than osm's info, we might want to switch to overriding that, but it seems worth looking into before doing so.

If we do choose to override osm data, we should probably have this be something configurable, where cities can specify not just which features to use, but which features override other features.

terryf82 · 2018-08-28T23:37:36Z

I agree on having a configurable setup where each city can optionally provide and where necessary overriding OSM features.

I like the idea of saying to cities "bring your own features" but I wonder if actual use of those features needs to wait until a release after 2.0. In the short term at least, when generating predictions I can't see that we're going to make use of features beyond a fairly common set - speed, signals, lighting, lanes etc (@bpben please let us know if this isn't the case). These are all likely to be available via OSM, but if they aren't for a city and the city has them, we can use those. Either way, we can map their feature names to a common vocabulary, which has immediate pay-off for viz development and should only require minimal work to integrate into the existing modelling work (the longer we leave it the worse this could get).

alicefeng · 2018-08-28T23:48:02Z

Oh definitely cities should be able to add in whatever custom data they have. But I think that should extend the list of features rather than, say, drop all of the osm features and only use city-provided data. If cities can entirely pick and choose what they want in their model that's going to limit how much the viz can display in a human-friendly manner.

At any rate, these are the segment features the viz is currently using:

prediction
display_name
segment_id
center_x, center_y
SPEEDLIMIT/osm_speed

terryf82 · 2018-08-28T23:56:53Z

Great, I think we have the beginnings of a segment standard then.

I'll try and get something together before the next meeting, where we can discuss this further and get Ben's input too.

j-t-t · 2018-08-28T23:57:31Z

Okay, as far as I see it there are two straightforward things to do:

Just use osm speed limit instead of Boston's. Super simple, only requires a change to the config_boston.yml
Or, populate the resulting merged data with the osm speed. It never gets dropped from the maps, it's just not added to the set of features the prediction uses. So the postprocessing step could merge in osm_speed.

bpben · 2018-08-29T02:40:46Z

Echoing @j-t-t and @alicefeng on this: We definitely need to allow cities to add their own custom features. I think that's what @j-t-t is working on with the point based feature additions. I think it would also help me and other people working on the modeling to test out new features.

Also: I don't see a problem with including both in the model. Generally, there's an issue of features giving the same information with parametric models (e.g. logistic regression), but our logistic model testing at the moment doesn't really take that into account. You could imagine the case that a segment's speed according to Boston and speed according to OSM both provide some kind of interesting information for the model.

So, adding a third option to @j-t-t , just include everything. And maybe do smarter feature selection.

bpben · 2023-08-21T13:43:14Z

#318 should address this, allowing customization of features based on points, we have other capabilities for pulling in other maps. I do still think we need some standardization here, but will remove help wanted, as it's more a job for a more core group of volunteers.

terryf82 added the data management label Mar 4, 2018

j-t-t self-assigned this Mar 4, 2018

bpben mentioned this issue Mar 26, 2018

Create standard input / output standards between feature gen / modeling / viz tracks #32

Closed

bpben added the help wanted label Mar 17, 2019

bpben removed the help wanted label Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define standard: segments & intersections #66

Define standard: segments & intersections #66

terryf82 commented Mar 4, 2018 •

edited

Loading

j-t-t commented Mar 7, 2018

j-t-t commented Mar 7, 2018 •

edited

Loading

j-t-t commented Mar 7, 2018

alicefeng commented Aug 27, 2018

terryf82 commented Aug 28, 2018

j-t-t commented Aug 28, 2018

terryf82 commented Aug 28, 2018

alicefeng commented Aug 28, 2018

terryf82 commented Aug 28, 2018

j-t-t commented Aug 28, 2018

bpben commented Aug 29, 2018

bpben commented Aug 21, 2023

Define standard: segments & intersections #66

Define standard: segments & intersections #66

Comments

terryf82 commented Mar 4, 2018 • edited Loading

j-t-t commented Mar 7, 2018

j-t-t commented Mar 7, 2018 • edited Loading

j-t-t commented Mar 7, 2018

alicefeng commented Aug 27, 2018

terryf82 commented Aug 28, 2018

j-t-t commented Aug 28, 2018

terryf82 commented Aug 28, 2018

alicefeng commented Aug 28, 2018

terryf82 commented Aug 28, 2018

j-t-t commented Aug 28, 2018

bpben commented Aug 29, 2018

bpben commented Aug 21, 2023

terryf82 commented Mar 4, 2018 •

edited

Loading

j-t-t commented Mar 7, 2018 •

edited

Loading