Determine if old OSM data on cycling infrastructure could help with uplift modelling
Output: A CSV with LAD, a total length in 2011, and a total length in 2020
Setup:
- At least 100GB disk or so
- Connection that can reasonably download 40GB
- npm, mapshaper, osmium
Geofabrik has old osm.pbfs, if you click on "raw directory index". But https://download.geofabrik.de/europe/united-kingdom/england.html and for Europe only go back to 2014.
From https://wiki.openstreetmap.org/wiki/Planet.osm/full, I found the oldest pbf dump at https://planet.openstreetmap.org/pbf/full-history/history-221219.osm.pbf, which is 112GB. But then thankfully I found https://planet.osm.org/planet/full-history/ and went for a 2013 history dump https://planet.osm.org/planet/full-history/2013/history_2013-02-05_1701.osm.bz2, only 40GB (15 minutes to download with gigabit fiber)!
From https://osmcode.org/osmium-tool/manual.html#working-with-history-files, then we can turn a history dump into a regular dump:
osmium time-filter history_2013-02-05_1701.osm.bz2 2011-01-01T00:00:00Z -o 2011.osm.pbf
That took about 2 hours, and now down to 10 GB. Then we can clip to a bounding box of England (cheers bboxfinder.com), using the fastest extraction strategy and removing unneeded metadata:
osmium extract -b -6.020508,49.696062,2.329102,55.949200 2011.osm.pbf -o england_2011.pbf -f pbf,add_metadata=false -s simple
That took just 30 seconds, with output just 175 MB. The equivalent extract today is about 1.2 GB, so that's a quick sense of how sparse OSM data was in 2011!
(TODO: Is it faster to clip first, then time-filter?)
This is recent enough, so Geofabrik works: http://download.geofabrik.de/europe/united-kingdom/england-200101.osm.pbf. No idea what the filename means, but osmium fileinfo -e england-200101.osm.pbf
confirms the timestamp of changes in here.
And likewise, 2016 is http://download.geofabrik.de/europe/united-kingdom/england-160101.osm.pbf
There's no simple tag for cycling infra in OSM. Ohsome references a thorough query from this paper. osmium can't do a complicated filter, so do it ourselves in JS.
I put england.osm.pbf
in a 2011
and 2020
directory and repeated the next steps for both.
cd 2011
osmium tags-filter england.osm.pbf w/highway -o highways.osm.pbf
osmium export highways.osm.pbf --config ../osmium_with_ids.cfg --geometry-type=linestring -f geojsonseq -x print_record_separator=false -o highways.geojson
npm run filter `pwd`/highways.geojson 2> cycleways.geojson
# For convenient use in QGIS, convert to geopackage
# Manually remove the trailing comma after the last feature, making it valid JSON
ogr2ogr -f GPKG cycleways.gpkg cycleways.geojson
Download 2011 LAD boundaries as GJ from https://geoportal.statistics.gov.uk/datasets/cf3af807271246f4a8865e30f308fc21_0/explore, and convert it to WGS84:
ogr2ogr lads_2011.geojson -t_srs EPSG:4326 ~/Downloads/Local_Authority_Districts_December_2011_FEB_EW_2022_-7538937248730119669.geojson -sql 'SELECT * FROM "Local_Authority_Districts_December_2011_FEB_EW_2022_-7538937248730119669"'
We want to take the England-wide cycleway GJ file and split it into one file per LAD. First we add the lad11cd
property to each LineString using mapshaper:
mapshaper-xl cycleways.geojson -divide ../lads_2011.geojson -o cycleways_grouped.geojson
Then we split into a bunch of files:
mkdir split; cd split; mapshaper-xl -i ../cycleways_grouped.geojson -split lad11cd -o format=geojson
# Leftover out-of-bounds stuff in Scotland
rm -f null.json
# Actually remove everything from Scotland, since it'll only be in the imperfect 2011 clip
rm -fv W*.json
Now for each of those split files, we want to sum the length of all the LineStrings inside.
# Back in the main directory
npm run sum 2> cycleway_lengths_by_lad.csv
Something like https://hex.ohsome.org/#/cycleways_w/2011-06-01T00:00:00Z/8/52.07429015262514/-0.6955267371189224 might just work
- Using Ohsome Quality Analyst?
- Or checking for a sample of schemes known to have been built between 2011 and 2020