A simple GTFS library.
Benefits:
- Fast parser
- Reads directly from zip archives
- Entity generators, read tables line-by-line
- Entity graph structure
- Built-in basic validator
Installation using pip:
pip install mzgtfs
Alternatively, download from PyPi or clone this repository, and install using setup.py:
python ./setup.py install
The dependency unicodecsv will be automatically installed using the above methods.
The mzgtfs.feed.Feed
class is the main entry point for reading and writing schedule data. A feed can be read from either a GTFS zip file, or a directory of CSV files. The validation and write methods are discussed later in this document.
Feed method | Description |
---|---|
init(filename=None, path=None) | Open a feed, given a GTFS zip file or directory |
preload() | Load the entire feed and entity relationships |
read(table) | Return a list of entities from a table; e.g. feed.read('stops') |
iterread(table) | Entity generator |
write(filename, entities, sortkey=None, columns=None) | Write a CSV file |
make_zip(filename, files=None, path=None, clone=None) | Create a GTFS zip archive |
validate() | Validate feed |
validate_feedvalidator() | Validate using external feedvalidator.py |
The GTFS CSV files are mapped to the following Entity classes:
Feed method | GTFS file | Entity class |
---|---|---|
agencies(), agency(agency_id) | agency.txt | Agency |
routes(), route(route_id) | routes.txt | Route |
trips(), trip(trip_id) | trips.txt | Trip |
stops(), stop(stop_id) | stops.txt | Stop |
stop_times() | stop_times.txt | StopTime |
shape_line(shape_id) | shapes.txt | ShapeLine |
service_periods(), service_period(service_id) | calendar.txt | ServicePeriod |
service_exceptions() | calendar_dates.txt | ServiceDate |
fares(), fare(fare_id) | fare_attributes.txt | FareAttribute |
fare_rules() | fare_rules.txt | FareRule |
transfers() | transfers.txt | Transfer |
frequencies() | frequencies.txt | Frequency |
feed_infos() | feed_info.txt | FeedInfo |
>>> import mzgtfs.feed
>>> gtfs_feed = mzgtfs.feed.Feed(filename='current.zip') # alt., path=<dir>
>>> gtfs_feed.routes()
[<Route BFC>, <Route CITY>, <Route STBA>, <Route AB>, <Route AAMV>]
>>> gtfs_feed.stops()
[<Stop NANAA>, <Stop BULLFROG>, <Stop FUR_CREEK_RES>, <Stop BEATTY_AIRPORT>, <Stop EMSI>, <Stop DADAN>, <Stop NADAV>, <Stop STAGECOACH>, <Stop AMV>]
>>> gtfs_feed.stop('NANAA').json()
{'name': u'North Ave / N A Ave', ... }
Feeds are built on relations between entities; for instance, each agency has a number of routes, these routes have trips, and so on. A convenient way to work with a feed is the preload()
method, which loads the entire feed and constructs a graph of entities. This provides quick access, such as finding all of the routes and stops associated with an agency.
>>> gtfs_feed.preload() # Load all tables and create relationships
>>> gtfs_feed.agencies()
[<Agency DTA>, <Agency ATD>]
>>> dta = gtfs_feed.agency('DTA')
>>> len(dta.routes())
5
>>> len(dta.stops())
9
>>> dta.routes()
set([<Route STBA>, <Route CITY>, <Route AB>, <Route AAMV>, <Route BFC>])
>>> dta.route('CITY').stops()
set([<Stop EMSI>, <Stop DADAN>, <Stop NANAA>, <Stop NADAV>, <Stop STAGECOACH>])
Each of the access methods in the above table will read the CSV file and cache the resulting entities. If you want to read a table line-by-line with lower overhead, you can use iterread(table)
. This is especially useful with stop_times.txt, which may have millions of rows.
>>> gtfs_feed.iterread('routes')
<generator object iterread at 0x101cd86e0>
>>> for route in gtfs_feed.iterread('routes'): print route
<Route AB>
<Route BFC>
<Route STBA>
<Route CITY>
<Route AAMV>
The base Entity class provides the following methods.
Entity method | Description |
---|---|
get(key, default=None) | Get a GTFS attribute |
entity[key] | Get a GTFS attribute |
len(entity) | Number of attributes |
key in entity | Attribute exists in entity |
keys() | List attributes |
items() | Attribute keys, values |
set(key, value) | Set an attribute |
id() | GTFS entity ID, e.g. agency_id, stop_id, etc. |
name() | A reasonable entity name or description |
point() | A point geometry, if one exists |
bbox() | Entity bounding box |
geometry() | A GeoJSON geometry |
children() | Entity children (e.g. agency -> routes) |
add_child(child) | Add a child entity |
parents() | Entity parents (e.g. route -> agencies) |
add_parent(parent) | Add a parent entity |
validate() | Validate entity; you may pass in a reported |
validate_feed() | Validate entity relationships |
json() | JSON representation |
Entity.from_json(data, feed) | Class method; create Entity from JSON |
Entity.from_row(data, feed) | Class method; create Entity from CSV row |
This library contains a basic GTFS validator. It validates required and optional attributes and their values, foreign keys, and requirements such as stop sequences.
Additionally, a wrapper to Google's Transitfeed feedvalidator.py
is provided, if it is available on your system. This provides additional checks, as well as warnings for common feed problems such as date ranges, stop spacing, bus speeds, etc. However, this is currently only supported on zip'd feeds; if you have made any changes, you will have to write out a zip file first.
>>> import mzgtfs.feed
>>> import mzgtfs.validation
>>> report = mzgtfs.validation.ValidationReport()
>>> gtfs_feed = mzgtfs.feed.Feed('current.zip')
>>> gtfs_feed.validate(validator=report)
>>> gtfs_feed.validate_feedvalidator(validator=report)
>>> report.report()
Validation report:
<Feed .//mzgtfs/examples/sample-feed.zip>: Errors reported by feedvalidator.py; see report.html for details
Writing out GTFS CSV files and creating new zip archives is also supported.
>>> import mzgtfs.feed
>>> gtfs_feed = mzgtfs.feed.Feed('original.zip')
>>> for stop in gtfs_feed.stops(): stop.set('zone_id', '1')
>>> # Write out a stops.txt table with our updated stops, sorting on 'stop_id'
>>> gtfs_feed.write('stops.txt', gtfs_feed.stops(), sortkey='stop_id')
>>> # Create "new.zip", merging our stops table and the original feed.
>>> gtfs_feed.make_zip('new.zip', files=['stops.txt'], clone='original.zip')
The library's test suite uses Python unittest. To run the test suite:
python -m unittest discover
Please open a Github issue with as much of the following information as you're able to specify, or contact us for assistance.
Transitland is sponsored by Mapzen. Contact us with your questions, comments, or suggests: [email protected].