My local bus company Nottingham City Transport (NCTX) doesn't have an API for real time bus departures, and I couldn't find any other source of this data so I decided to make my own using Cloudflare Workers and a screen scraping approach.
If you build a front end or interface to this, I'd love to see it. You can get hold of me here. I also wrote about this project on my website.
To run this locally, you'll need:
- Git command line tools - to clone the project.
- Node.js - use the latest long term stable (LTS) version.
- Cloudflare Wrangler - a tool for developing and deploying Cloudflare Workers (requires a free Cloudflare account)
- A web browser, anything will do but I like Google Chrome.
First, get the code:
$ git clone https://github.com/simonprickett/nctx-stop-api.git
$ cd nctx-stop-api
Next, install Wrangler globally:
$ npm install -g wrangler
Install the project dependencies:
$ npm install
Now, you're ready to start a local copy of the worker:
wrangler dev
⛅️ wrangler 2.13.0
--------------------
⬣ Listening at http://0.0.0.0:8787
- http://127.0.0.1:8787
- http://192.168.4.22:8787
The first time that you do this, you'll be prompted to login to Cloudflare and authorise Wrangler. Follow the on screen instructions and prompts.
Test the worker locally by visiting:
http://localhost:8787/?stopId=3390FO07
When you're ready to publish the worker to the world and give it a public URL that's part of your Cloudflare account, use Wrangler:
$ wrangler publish
⛅️ wrangler 2.13.0
--------------------
Total Upload: 8.08 KiB / gzip: 2.13 KiB
Uploaded nctx (2.56 sec)
Published nctx (1.63 sec)
https://nctx.<your cloudflare workers domain>.workers.dev
Once deployed, your worker will be accessible on the internet at the URL that Wrangler outputs at the end of the publishing process. Note that this is a https
URL - Cloudflare takes care of SSL for you.
This API works at the bus stop level, there's no endpoints to get a list of routes or stops. To make it work you'll need a bus stop ID. You can get one of these from the Nottingham City Transport website like so:
- Go to the Nottingham City Transport home page.
- Enter a location into the "Live Departures" search box (example locations: "Sherwood", "Gotham", "Victoria Centre"), or click "find your stop on the map".
- A map appears showing bus stops near your location - pick one and click on it.
- A pop up appears, click "Departures"
- You should now be looking at the live departure board for a stop. The stop ID is the final part of the page URL, for example given the URL
https://www.nctx.co.uk/stops/3390J1
, the stop ID is3390J1
. - Make a note of your stop ID and use it in the examples below.
The following examples all use stop ID 3390FO07
("Forest Recreation Ground"), and route numbers and line colours that pass through that stop.
All examples are GET
requests, so you can just use a browser to try them out. You could also use Postman. These examples assume you're running the worker code locally, just swap the URL to your production one if you've deployed it and want to run it in production.
To get all the departures for a given stop ID go to the following URL:
http://localhost:8787/?stopId=3390FO07
This returns a JSON response that looks like this:
{
"stopId": "3390FO07",
"stopName": "Forest Recreation Ground",
"departures": [
{
"lineColour": "#FED100",
"line": "yellow",
"routeNumber": "70",
"destination": "City, Victoria Centre T3",
"expected": "2 mins",
"expectedMins": 2,
"isRealTime": true
},
{
"lineColour": "#935E3A",
"line": "brown",
"routeNumber": "16",
"destination": "City, Victoria Centre T2",
"expected": "3 mins",
"expectedMins": 3,
"isRealTime": true
},
{
"lineColour": "#522398",
"line": "purple",
"routeNumber": "88",
"destination": "City, Parliament St P4",
"expected": "5 mins",
"expectedMins": 5,
"isRealTime": true
}
]
}
The stopId
field contains the ID of the stop that you provided. stopName
contains the full name for that stop. The remainder of the response is contained in the departures
array. Each departure has the following data fields:
lineColour
: a string containing the HTML colour code for the line that the bus is on. The buses run on colour coded lines, each line may contain up to three or four route numbers and all buses on the same line colour head in roughly the same direction.line
: a string containing the name of the line that the bus is on. This is lowercase. See later in this document for a list of possible values.routeNumber
: a string containing the route number. It's a string not a number because some routes have letters in them e.g.N1
,59A
,1C
,69X
.destination
: where the bus route terminates / where the bus is headed to. This is a string.expected
: when the bus is expected to arrive at the stop. This is a string value that takes one of two forms:<number> mins
or<hh>:<mm>
with the hours in 24 hour format.expectedMins
: the number of minutes until the bus is expected to arrive at the stop. This will be an integer number, and0
if the bus is due at the stop now.isRealTime
: is a boolean that will betrue
if this departure is a real time estimate (the bus has tracking on it) orfalse
otherwise... the bus either doesn't have tracking or hasn't started on the route yet, so timetable information is shown instead.
There are various ways in which you can filter and limit the data returned. These are all specified using extra parameters on the request, and can be combined together in a single request.
Use the filters by adding additional request parameters:
line
- to filter by a specific line colour using the line's name e.g.&line=yellow
. Valid values forline
are (note these are case sensitive):brown
green
red
pink
turquoise
orange
skyblue
lilac
yellow
purple
navy
grey
blue
lime
lineColour
- to filter by a specific line colour using the line's HTML colour code e.g.&lineColour=#3FCFD5
. Valid values forlineColour
are (note these are case sensitive):#935E3A
(brown)#007A4D
(green)#CD202C
(red)#DA487E
(pink)#3FCFD5
(turquoise)#E37222
(orange)#6AADE4
(skyblue)#C1AFE5
(lilac)#FED100
(yellow)#522398
(purple)#002663
(navy)#B5B6B3
(grey)#00A1DE
(blue)#92D400
(lime)
routeNumber
- to filter by a specific route number. This will also return variants of that route number for example&routeNumber=69
will return69
,69A
,69X
etc.&routeNumber=69X
will only return69X
.realTimeOnly
- set to true to return only departures that have real time estimates (where the bus is reporting its live location). Example:&realTimeOnly=true
. Note: SettingrealTimeOnly
to any value whatsover turns on this filter.maxWaitTime
- use to filter departures that are due in the next so many minutes. Example:&maxWaitTime=10
.maxResults
- only return the first so many results (or fewer if there aren't that many matches). To get the first 5:&maxResults=5
.
Example showing how to combine these... let's get up to 4 yellow line departures in the next 60 mins:
http://localhost:8787/?stopId=3390FO07&line=yellow&maxWaitTime=60&maxResults=4
The order of the arguments doesn't matter.
The worker can return data in two different formats...
JSON is the default response format, which is described earlier in this document. There's no need to do this but you can set the format
request parameter to json
if you like:
http://localhost:8787/?stopId=3390FO07&maxResults=3&format=json
The response looks like this:
{
"stopId": "3390FO07",
"stopName": "Forest Recreation Ground",
"departures": [
{
"lineColour": "#FED100",
"line": "yellow",
"routeNumber": "69",
"destination": "City, Victoria Centre T4",
"expected": "1 min",
"expectedMins": 1,
"isRealTime": true
},
{
"lineColour": "#935E3A",
"line": "brown",
"routeNumber": "15",
"destination": "City, Victoria Centre T2",
"expected": "2 mins",
"expectedMins": 2,
"isRealTime": true
},
{
"lineColour": "#FED100",
"line": "yellow",
"routeNumber": "68",
"destination": "City, Victoria Centre T4",
"expected": "4 mins",
"expectedMins": 4,
"isRealTime": true
}
]
}
If you opt to use the fields
request parameter, only the fields you ask for will be returned:
http://localhost:8787/?stopId=3390FO07&format=json&maxResults=3&fields=line,routeNumber,expected
returns:
{
"stopId": "3390FO07",
"stopName": "Forest Recreation Ground",
"departures": [
{
"line": "yellow",
"routeNumber": "69",
"expected": "1 min"
},
{
"line": "brown",
"routeNumber": "15",
"expected": "2 mins"
},
{
"line": "yellow",
"routeNumber": "68",
"expected": "5 mins"
}
]
}
The worker can also return delimited string responses. You might want to use these when processing the response on a device with limited capabilities, where a JSON parser might not be viable. To get a string response set the format
request parameter to string
:
http://localhost:8787/?stopId=3390FO07&format=string&maxResults=3
The response format looks like this:
3390FO07|Forest Recreation Ground|#FED100^yellow^68^City, Victoria Centre T4^1 min^1^true|^#92D400^lime^56^City, Parliament St P2^4 mins^4^true|^#522398^purple^89^City, Parliament St P5^5 mins^5^true
The following fields are returned, separated by |
characters:
- The stop ID.
- The stop name.
- Each departure.
Within each departure, fields are separated by ^
characters. If you choose to filter which fields are returned using the fields
request parameter, those fields will be omitted without returning a blank value. For example:
http://localhost:8787/?stopId=3390FO07&format=string&maxResults=3&fields=line,routeNumber,expected
Returns:
3390FO07|Forest Recreation Ground|yellow^69^1 min|^brown^15^2 mins|^yellow^68^4 mins
This project is implemented as a Cloudflare Worker, code that runs and scales in a serverless execution environment across the Cloudflare network. Workers can be written in a few different languages, I chose JavaScript. All of the code lives in a single file, index.js
.
Workers generally consist of an event listener and an event handler (see docs). The event listener listens for fetch
events (such an event occurs when someone requests the URL that the worker is deployed at). It then calls the event handler whose job is to take the Request
object for this call (see docs) and build an appropriate Response
object (docs here) then return it to the client.
All of the code to query the NCTX website, gather the bus departure data, filter and return it in the requested format happens in the handleRequest
function.
The first thing that the code has to do is check that a stop ID was provided. It does this by looking for a URL parameter named stopId
and responding with a bad request error if one isn't provided, or the request type was anything other than a GET
:
const url = new URL(request.url)
const stopId = url.searchParams.get('stopId')
if (request.method !== 'GET' || !stopId) {
return new Response(BAD_REQUEST_TEXT, {
status: BAD_REQUEST_CODE,
headers: CORS_HEADERS,
})
}
If a stop ID was provided, we'll get the source HTML for that stop's page from NCTX:
const stopUrl = `https://nctx.co.uk/stops/${stopId}`
const stopPage = await fetch(stopUrl)
You can check out what a stop page looks like here, which is the page for stop "3390FO07" (Forest Recreation Ground).
The HTML page source has been fetched into a variable called stopPage
, what we need to do now is parse through it and find the data for each departure from the stop. Cloudflare provides a HTML Rewriter as part of the Workers API - it parses the HTML for us, firing listener functions whenever selector expressions that we are looking for are found.
From inspecting the HTML page source from NCTX, we can determine which selectors will match for each element containing a data item that we're interested in. For example, here let's find where a bus that's due to pass by the stop is headed to, which is contained in a paragraph with a CSS class names single-visit__description
:
const htmlRewriter = await new HTMLRewriter()
.on('p.single-visit__description', {
text(text) {
if (text.text.length > 0) {
currentDeparture.destination = text.text.trim()
}
},
})
// functions for other matches...
.transform(stopPage) // run the rewriter
.text()
When a match for such a paragraph tag is found, we provide a handler for text chunks and store the text found, trimming any whitespace from it.
The code contains several functions that fire when different selectors are found. These each get a single piece of data about a bus departure and store it in an object named currentDeparture
.
The last data item found for each departure is either the real time estimate of when the bus will arrive at the stop, or a timetable estimate for buses that don't have real time tracking, or which haven't started on the journey yet. When one of these items is found, the code pushes the currentDeparture
object into an array named departures
, and starts again with the next departure. In this way, we build up an array of objects describing upcoming departures from the stop.
Each of the functions that run when a selector match is found have to do some leel of cleanup or formatting on the data to make it more useful in an API response. The most common change is to trim whitespace off the start and end of strings, which is generally done like this:
const trimmedText = text.text.trim()
Where text
is a text chunk returned by the HTML rewriter and text.text
is the string value found.
Some data is checked against lookup objects to get the value that goes into the API response. For example, there's no line name in the HTML, but we can work it out based on an HTML colour code in the source:
// Maps line colour codes to line names.
const LINE_NAME_LOOKUP = {
'#935E3A': 'brown',
'#007A4D': 'green',
'#CD202C': 'red',
'#DA487E': 'pink',
'#3FCFD5': 'turquoise',
'#E37222': 'orange',
'#6AADE4': 'skyblue',
'#C1AFE5': 'lilac',
'#FED100': 'yellow',
'#522398': 'purple',
'#002663': 'navy',
'#B5B6B3': 'grey',
'#00A1DE': 'blue',
'#92D400': 'lime',
}
...
.on('div.single-visit__highlight', {
element(elem) {
// Pull this out of the style attribute whose value looks like: background-color:#92D400;
const styleAttr = elem.getAttribute('style')
const routeColour = styleAttr.substring(
'background-color:'.length,
styleAttr.length - 1,
)
currentDeparture.lineColour = routeColour
currentDeparture.line = LINE_NAME_LOOKUP[routeColour]
},
})
Another data item that requires noteworthy formatting is the number of minutes until the bus is due to arrive at the stop. In the source HTML, this can have a number of formats. For buses with live tracking:
- "Due" - means the bus is due in 0 mins.
- "2 mins" - need to extract the 2 and turn it into an integer for the response.
These scenarios are handled here:
.on('div.single-visit__time--expected', {
// Bus has live tracking, value will be "Due" or a number of minutes e.g. "2 mins".
text(text) {
if (text.text.length > 0) {
const trimmedText = text.text.trim()
currentDeparture.expected = trimmedText
// When due, the bus is expected in 0 minutes.
if (trimmedText.toLowerCase() === 'due') {
currentDeparture.expectedMins = 0
} else {
// Parse out the number of minutes.
currentDeparture.expectedMins = parseInt(
trimmedText.split(' ')[0],
10,
)
}
currentDeparture.isRealTime = true
departures.push(currentDeparture)
currentDeparture = {}
}
},
})
For buses without live tracking, we also have to deal with times in 24hr format:
- "Due" - means the bus is due in 0 mins.
- "22:23" - 24 hour clock format for when a real time estimate isn't available. This has to be turned into the number of minutes between the present time, and the time in the HTML... which may be for early the following morning as the buses run beyond midnight. This involves some annoying mental gymnastics with JavaScript dates and is handled like so:
// Used when getting the current UK time... see
// https://stackoverflow.com/questions/25050034/get-iso-8601-using-intl-datetimeformat
const INTL_DATE_TIME_FORMAT_OPTIONS = {
timeZone: 'Europe/London',
year: 'numeric',
month: '2-digit',
day: '2-digit',
hour: '2-digit',
minute: '2-digit',
second: '2-digit',
hour12: false,
timeZoneName: 'short',
}
// Use a locale that has adopted ISO 8601 as there is no locale for
// that directly so using Sweden here...
const INTL_DATE_TIME_FORMAT_LOCALE = 'sv-SE'
.on('div.single-visit__time--aimed', {
// Bus does not have live tracking, value will be "Due" or a clock time e.g. "22:30"
// Sometimes though it's a number of minutes e.g. "59 mins".
text(text) {
if (text.text.length > 0) {
const trimmedText = text.text.trim()
currentDeparture.expected = trimmedText
// When due, the bus is expected in 0 minutes.
if (trimmedText.toLowerCase() === 'due') {
currentDeparture.expectedMins = 0
} else {
// Calculate number of minutes in the future that the value of trimmedText
// represents (value is a clock time e.g. 22:30) and store in expectedMins.
// careful too as 00:10 could be today or tomorrow...
if (trimmedText.indexOf(':') !== -1) {
// This time is in the "hh:mm" 24hr format.
const ukNow = new Date(new Intl.DateTimeFormat(INTL_DATE_TIME_FORMAT_LOCALE, INTL_DATE_TIME_FORMAT_OPTIONS).format(new Date()))
const departureDate = new Date(new Intl.DateTimeFormat(INTL_DATE_TIME_FORMAT_LOCALE, INTL_DATE_TIME_FORMAT_OPTIONS).format(new Date()))
// Zero these out for better comparisons at the minute level.
ukNow.setSeconds(0)
ukNow.setMilliseconds(0)
departureDate.setSeconds(0)
departureDate.setMilliseconds(0)
const [ departureHours, departureMins ] = trimmedText.split(':')
const departureHoursInt = parseInt(departureHours, 10)
const departureMinsInt = parseInt(departureMins, 10)
departureDate.setHours(departureHoursInt)
departureDate.setMinutes(departureMinsInt)
if (ukNow.getHours() > departureHoursInt) {
// The departure is tomorrow e.g. it's now 23:00 and the departure is 00:20.
departureDate.setDate(departureDate.getDate() + 1)
}
const millis = departureDate - ukNow
const minsToDeparture = (millis/1000)/60
currentDeparture.expectedMins = minsToDeparture
} else {
// This time is in the "59 mins" format.
currentDeparture.expectedMins = parseInt(
trimmedText.split(' ')[0],
10
)
}
}
currentDeparture.isRealTime = false
departures.push(currentDeparture)
currentDeparture = {}
}
},
})
As we saw earlier in this document, there's several ways that the response can be filtered using request parameters. For example, we may only need to return buses operating on a given line colour or only the first 5 results. As we saw, these filter parameters can be added together so we need to be sure to apply each one that was specified on the request before returning our response.
These filters are implemented as a series of code blocks, each of which checks for the presence of a request parameter then removes departure objects from the departures
array that don't match the filter criteria.
The route number filter is an interesting example, as some routes have different variants that are still the same route number, but may not travel the entire length of the route or stop at all of the stops. These variants end in a letter - X
for example often indicating "express". I decided that, for example, filtering for route 69
should also return route 69A
, 69C
, 69X
so had to implement some logic for that as follows:
const NUMBER_CHARS_LOOKUP = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
// Filter by route if needed... route 69 includes 69A, 69X etc but not 169 or 690.
const routeToFilter = url.searchParams.get('routeNumber')
if (routeToFilter) {
results.departures = results.departures.filter(departure => {
const lastChar = departure.routeNumber.substring(
departure.routeNumber.length - 1,
)
// Route number either needs to match exactly, or start with the provded route number and
// not end in a number... so if we're looking for route 58 this should return route 58,
// 58A, 58X but not 590. This also allows us to be more specific and look for 58X.
// You could probably use a regular expression here but I find they introduce more issues
// than they solve, so I avoid them :)
return (
departure.routeNumber === routeToFilter ||
(departure.routeNumber.startsWith(routeToFilter) &&
!NUMBER_CHARS_LOOKUP.includes(lastChar))
)
})
}
The other filters follow similar patterns - use the JavaScript array filter function to run logic against each members of departures
to determine whether to keep it or not.
If the fields
request parameter was provided on the request, we need to return only a specified subset of the data fields.
fields
is expected to be a comma separated list of data field names, so we get those using split
, then set the results.departures
array to the result of mapping over its current value, returning departure objects that only contain the requested fields:
if (url.searchParams.get('fields')) {
const fieldsToReturn = url.searchParams.get('fields').split(',')
if (fieldsToReturn.length > 0) {
results.departures = results.departures.map(departure => {
const newDeparture = {}
for (const fieldName of fieldsToReturn) {
newDeparture[fieldName] = departure[fieldName]
}
return newDeparture
})
}
}
The code that returns the response to the caller first determines if a JSON or String response was requested...
For a JSON response (the default), we create a new Response
object, returning formatted JSON and setting the content-type
header appropriately:
const responseFormat = url.searchParams.get('format')
if (!responseFormat || responseFormat === 'json') {
return new Response(JSON.stringify(results, null, 2), {
headers: {
'content-type': 'application/json;charset=UTF-8',
...CORS_HEADERS,
},
})
}
For a String response (the value of the request parameter format
is set to string
), we need to output the stop ID and stop name first, separated by |
, then output each departure's data with a ^
separating each field for that departure and |
separating each departure. Note there's also some code to make sure we don't leave a trailing delimiter after the last field:
let stringResults = `${results.stopId}|${results.stopName}`
let stringDepartures = ''
for (const departure of results.departures) {
for (const val of Object.values(departure)) {
stringDepartures = `${stringDepartures}${
stringDepartures.length > 0 ? '^' : ''
}${val}`
}
stringDepartures = `${stringDepartures}|`
}
stringResults = `${stringResults}|${
stringDepartures.length > 0
? stringDepartures.substring(0, stringDepartures.length - 1)
: ''
}`
return new Response(stringResults, { headers: CORS_HEADERS })
I wanted the API to be callable from anywhere, including JavaScript embedded in web pages. In order to allow that, I had to enable Cross Origin Resource Sharing or CORS. As we're only handling "simple" GET
requests here, we don't need to worry about the CORS pre-flight OPTIONS
request scenario. This means that enabling CORS is as simple as ensuring that the correct extra headers are returned with each response.
Here's the headers I'm sending back as I want to allow the API to be called from anywhere:
const CORS_HEADERS = {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET',
}
And here's an example of how to add them to the response that the Cloudflare Worker sends back to the client:
return new Response(JSON.stringify(results, null, 2), {
headers: {
'content-type': 'application/json;charset=UTF-8',
...CORS_HEADERS,
},
})