Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically handle entities irrespective of IsEntityInMessage #114

Open
jamespfennell opened this issue Aug 2, 2023 · 8 comments
Open

Comments

@jamespfennell
Copy link
Owner

jamespfennell commented Aug 2, 2023

Every GTFS realtime trip is optionally associated to a vehicle (and vice versa). Naively, this relationship would be represented by a nullable foreign key constraint, say vehicle.trip_pk. However in #110 when developing the first iteration of the vehicle feature, @cedarbaum discovered that this is non-trivial because of how the underlying GTFS library reports trips and vehicles.

There is a problem in the case (like the NYC bus) when trips and vehicles are reported in different GTFS realtime feeds. The problem is that in the trips feed the GTFS library adds a vehicle "stub" (a small vehicle message tagged with IsEntityInMessage: false) and in the vehicles feed adds a similar trips stub. When parsing the trips feed Transiter tries to update the vehicle using the stub which is definitely wrong because the vehicle data is in the other feed. The stub only exists to communicate the trip-vehicle relationship.

In #110, @cedarbaum solved this by adding a only_process_full_entities field to the feed configuration in which case IsEntityInMessage: false entities are skipped.

I think though we can probably handle this case without explicit configuration. One idea I had for solving this is to persist the IsEntityInMessage field in the database for both trips and vehicles. Then in the GTFS realtime updater, we would skip updating a trip if all of the following conditions hold:

  • The version of the trip in the GTFS realtime feed has IsEntityInMessage: false
  • The current version of the trip in the database has IsEntityInMessage: true
  • The current version of the trip comes from a different feed.

One way to think about this is that when a real trip exists (i.e. IsEntityInMessage: true) there is a unique feed that owns that trip. Other feeds should not update that trip. We would do a similar thing for vehicles.

@jamespfennell jamespfennell changed the title Foriegn key vehicle.trip into the trip table Foreign key vehicle.trip into the trip table Aug 2, 2023
@jamespfennell jamespfennell changed the title Foreign key vehicle.trip into the trip table Automatically handle entities irrespective of IsEntityInMessage Aug 5, 2023
@jamespfennell
Copy link
Owner Author

Updated in light of the submitted version of #110.

@cedarbaum
Copy link
Collaborator

One additional complication for this work is that multiple feeds may try to insert the same entity concurrently, violating unique key constraints. For example, both the trip and vehicle feed start updates, both see that "vehicle 1" doesn't exist, and then both try to add it, guaranteeing one transaction fails and rolls back.

I think to fix this we would need some stronger consistency guarantees within the update transactions. The most granular mechanism I can think of would be a per-system write lock for the trip and vehicle tables (maybe using something like advisory locks).

Alternatively, updates could be scheduled in such a way that they don't overlap within a system.

Curious what your thoughts on above are!

@jamespfennell
Copy link
Owner Author

Yeah that's definitely an issue. Also, I tried to write some code to implement this ticket I think it will be complex...

I've been thinking, should we just skip all entities that have IsEntityInMessage: false? Basically what you did for the NYC buses with a configuration value, but just hardcoded in the code.

The original motivation for persisting IsEntityInMessage: false vehicles was that I think I've observed for the NYC subway there are sometimes trip updates that have a vehicle descriptor attached but no vehicle position entity. In this case you get a vehicle with IsEntityInMessage: false and my idea was that it would be still nice to persist that vehicle data. However I think the complexity is really high and on balance it's likely not worth it.

@jamespfennell
Copy link
Owner Author

@cedarbaum
Copy link
Collaborator

This sounds good to me, though I suppose we could also invert the option to be "opt in" to cover the subway vehicle case you mentioned (e.g., persistEntityReferences or something similar).

Also, to avoid leaving the deprecated option in the proto definition, we could change Feed config parsing to allow unknown fields and simply remove it. The trade-off here is easier deprecation of short-lived features such as this vs. allowing typos to slip through potentially. Let me know your thoughts on this; I have a branch with the proposed change here: cedarbaum@196758d

@jamespfennell
Copy link
Owner Author

Yeah the inverted option does sound good. It could come with a warning that vehicles and trips must be in the same feed, to avoid the races you mentioned. In this way the implementation would be simpler. For the moment I'm happy to just submit the branch I linked to, and if we want later we could add such an opt-in option.

Changing feed config parsing sounds great!! I didn't know we could do that!

jamespfennell added a commit that referenced this issue Aug 23, 2023
This essentially makes the configuration value always true. See the
issue for context.
@cedarbaum
Copy link
Collaborator

Awesome, just sent a PR for the feed config parsing change: https://github.com/jamespfennell/transiter/pull/123/files

Also I think updateTrips still needs to be modified to ignore partial entities by default. If we're OK to go ahead and just delete the old config option entirely, I can take care of fixing that as well.

@jamespfennell
Copy link
Owner Author

Uhhhh right we do need to handle trips! That would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants