Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define standard: segments & intersections #66

Open
terryf82 opened this issue Mar 4, 2018 · 12 comments
Open

Define standard: segments & intersections #66

terryf82 opened this issue Mar 4, 2018 · 12 comments
Assignees

Comments

@terryf82
Copy link
Collaborator

terryf82 commented Mar 4, 2018

Segments & intersections are now to be based off OSM data (nodes, ways and relations) to allow any city with OSM coverage to onboard into the project easily.

We need to define an initial standard of how the segments & intersections will be stored once built. Some questions that have been raised which may still be open -

  • what are the required & optional characteristics of a segment/intersection, including data typing?

  • are segments/intersections stored separately, or together? If stored together, how do we differentiate?

  • are we storing foreign key data for segments/intersections that connects them back to the OSM nodes/ways/relations from which they were built, so that we can detect & respond to changes in the source and possibly pass information back to OSM at some point?

@j-t-t j-t-t self-assigned this Mar 4, 2018
@j-t-t
Copy link
Collaborator

j-t-t commented Mar 7, 2018

To clarify what we've done thus far: a 'segment' is either an intersection segment or a non-intersection segment. At the moment they are stored separately:

  • A non-intersection segment is a linestring, stored in a shapefile (at the moment in processed/maps/non_inters_segments.shp). They contain features drawn from open street map and can contain features from an additional city-specific map.
  • An intersection 'segment' is a multiline string consisting of the portion of osm ways that fall within a buffer of a set of close together intersection nodes. At the moment, the shapes of these segments are stored in processed/maps/inters_segments.shp, and their features are stored in a json file of a list of feature dicts for each intersection id (intersection ids are at the moment arbitrarily generated and only used internally).

Current set up doesn't need to be the final say on how things are stored!

@j-t-t
Copy link
Collaborator

j-t-t commented Mar 7, 2018

With regard to connecting back to open street map, it's not currently done but I agree that it needs to be. I've been scoping this out, and here's what I've found for non-intersection segments:

Openstreetmap has a whole bunch of extraneous nodes that we don’t care about. We’re using the osmnx package to simplify the road network. Osmnx has two main ways to simplify the network:

  • Taking out everything but intersections and end points (they call this ‘strict’)
  • Taking out everything but intersections, end points, and where the osmids change

There are a number of reasons osm ids can change even though it’s not obviously an intersection

  • There is an intersection with a private road (or a service road). Sometimes these don’t show up on Boston’s map, and sometimes they do (there’s a bit of a mismatch sometimes between service/private roads appearing on osm vs. Boston). Sometimes these intersections even have traffic lights.
  • The number of lanes change because a turn lane is added
  • Roads change names (at points where there is no intersection)
  • The highway type can change: it can go from a trunk to a trunk link, or other similar transitions

It seems to me that only the first case, and possibly only when there’s an actual traffic light should be two segments separated by an intersection. But it might be interesting to eventually look at some of the other nodes, for example is the likelihood of a crash higher when the number of lanes changes?

Thus, a non intersection segment (as this project has historically been defining them) can have more than one osm way id. So I think we need to decide if we just want to store a list of osm ids for each non-intersection segment, or if we want to redefine non-intersection segments to correspond 1-to-1 to an osmid. That would mean that we'd have an additional set of nodes represented somewhere that are not intersections. I can see pluses and minuses to redefining them:

  • Plus: we'll get more granular feature information (number of lanes, for example)
  • Plus: we won't have non-intersection segments have varying features, e.g. a different number of lanes in different places
  • Minus: having an additional set of nodes that aren't intersections will be one more thing to keep track of
  • Potential Minus: some osm ways might fall solely with an intersection which might have annoying unforeseen consequences that I'm not thinking about

@j-t-t
Copy link
Collaborator

j-t-t commented Mar 7, 2018

For intersection segments we will likely always want to have it represented in a way where each intersection corresponds to one or more osm nodes. This is because an intersection is by definition all the portions of ways that fall inside a bounding box of a set of close together intersection nodes.

@alicefeng
Copy link
Collaborator

Here are the features I'm currently seeing that have different names for the same thing:

  • SPEEDLIMIT vs osm_speed
  • F_F_Class vs hwy_type (not identical but seem to be in the same vein)

@terryf82
Copy link
Collaborator Author

As discussed in the channel - if we're going to implement a segments data standard to make viz development much simpler and more consistent, it makes sense for it to be done when the segment data is being created, which is quite early on in the pipeline process (data_generation).

This phase retrieves OSM segments, with optional overloading taking place if a city-supplied map is available (@j-t-t have I got this right?). It writes the segment data to file as part of make_canon_dataset, which in turn is used to train the model and generate predictions.

I'm happy to have a go at standardizing the segments and following the pipeline through to update any references/uses of the current feature names, but I wanted to check first whether those who actually wrote these scripts (@bpben @j-t-t @andhint @alicefeng) had any concerns about this issue or wanted to handle specific parts? Thanks.

@j-t-t
Copy link
Collaborator

j-t-t commented Aug 28, 2018

This is an interesting question. The reason we did the Boston-specific features was that performance of the model was better for them than for the osm features. I understand the desire to standardize, certainly from a viz perspective, but also for cities to be able to share information. But I do think that there's value for cities to be able to use their own features. I think there's value to that, and that's certainly something that we told Boston we could do when we met with them in July.

So if your plan is just to map features that we know about: speed limit and F_F_Class to the osm features, that seems fine. But I don't think we should drop existing Boston features, or prevent other cities from adding their features.

There's not really any overloading happening. Right now, the only reason Boston doesn't use osm features is that we specify which features to use in config_boston.yml. We could use both easily enough. If we think that Boston's speed limit/F_F_Class info is more useful or informative than osm's info, we might want to switch to overriding that, but it seems worth looking into before doing so.

If we do choose to override osm data, we should probably have this be something configurable, where cities can specify not just which features to use, but which features override other features.

@terryf82
Copy link
Collaborator Author

I agree on having a configurable setup where each city can optionally provide and where necessary overriding OSM features.

I like the idea of saying to cities "bring your own features" but I wonder if actual use of those features needs to wait until a release after 2.0. In the short term at least, when generating predictions I can't see that we're going to make use of features beyond a fairly common set - speed, signals, lighting, lanes etc (@bpben please let us know if this isn't the case). These are all likely to be available via OSM, but if they aren't for a city and the city has them, we can use those. Either way, we can map their feature names to a common vocabulary, which has immediate pay-off for viz development and should only require minimal work to integrate into the existing modelling work (the longer we leave it the worse this could get).

@alicefeng
Copy link
Collaborator

Oh definitely cities should be able to add in whatever custom data they have. But I think that should extend the list of features rather than, say, drop all of the osm features and only use city-provided data. If cities can entirely pick and choose what they want in their model that's going to limit how much the viz can display in a human-friendly manner.

At any rate, these are the segment features the viz is currently using:

  • prediction
  • display_name
  • segment_id
  • center_x, center_y
  • SPEEDLIMIT/osm_speed

@terryf82
Copy link
Collaborator Author

Great, I think we have the beginnings of a segment standard then.

I'll try and get something together before the next meeting, where we can discuss this further and get Ben's input too.

@j-t-t
Copy link
Collaborator

j-t-t commented Aug 28, 2018

Okay, as far as I see it there are two straightforward things to do:

  • Just use osm speed limit instead of Boston's. Super simple, only requires a change to the config_boston.yml
  • Or, populate the resulting merged data with the osm speed. It never gets dropped from the maps, it's just not added to the set of features the prediction uses. So the postprocessing step could merge in osm_speed.

@bpben
Copy link
Collaborator

bpben commented Aug 29, 2018

Echoing @j-t-t and @alicefeng on this: We definitely need to allow cities to add their own custom features. I think that's what @j-t-t is working on with the point based feature additions. I think it would also help me and other people working on the modeling to test out new features.

Also: I don't see a problem with including both in the model. Generally, there's an issue of features giving the same information with parametric models (e.g. logistic regression), but our logistic model testing at the moment doesn't really take that into account. You could imagine the case that a segment's speed according to Boston and speed according to OSM both provide some kind of interesting information for the model.

So, adding a third option to @j-t-t , just include everything. And maybe do smarter feature selection.

@bpben
Copy link
Collaborator

bpben commented Aug 21, 2023

#318 should address this, allowing customization of features based on points, we have other capabilities for pulling in other maps. I do still think we need some standardization here, but will remove help wanted, as it's more a job for a more core group of volunteers.

@bpben bpben removed the help wanted label Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants